IntegraChat / README.md
nothingworry's picture
chore: clean up README files after repository cleanup
93e2b71
|
raw
history blame
51.8 kB

IntegraChat β€” Enterprise MCP Autonomous Agent Platform

Track: MCP in Action
Category: Enterprise
Tag: mcp-in-action-track-enterprise


πŸ“‹ Table of Contents


Overview

IntegraChat is an enterprise-grade, multi-tenant AI platform that demonstrates the full capabilities of the Model Context Protocol (MCP) in a production-style environment. Built with enterprise governance and observability in mind, IntegraChat combines autonomous tool-using agents, RAG retrieval, live web search, and admin compliance under strict tenant isolation.

This platform showcases how MCP can power intelligent, governed, multi-tenant AI systems with real-time analytics, regex-based red-flag detection, and comprehensive tool orchestration.


πŸš€ Quick Start

Windows Users

# 1. Install dependencies
pip install -r requirements.txt

# 2. Configure environment (copy and edit .env)
cp env.example .env
# Edit .env with your credentials (Supabase, LLM, etc.)

# 3. Start all services
start.bat

Manual Setup

# 1. Install dependencies
pip install -r requirements.txt

# 2. Configure environment
cp env.example .env
# Edit .env with your credentials

# 3. Start FastAPI backend (Terminal 1)
uvicorn backend.api.main:app --port 8000 --reload

# 4. Start unified MCP server (Terminal 2)
python backend/mcp_server/server.py

# 5. Start Gradio UI (Terminal 3)
python app.py

Then access:

  • Gradio UI: http://localhost:7860
  • FastAPI Docs: http://localhost:8000/docs

Security Note: REST requests that hit protected endpoints must include both x-tenant-id and x-user-role headers. Roles (viewer, editor, admin, owner) determine which actionsβ€”such as document ingestion, rule uploads, or analytics accessβ€”the caller may perform.


Features

Core Capabilities

  • πŸ€– Autonomous Multi-Step MCP Agents – Intelligent tool-aware agent that plans and executes multi-step workflows across RAG, Web, Admin, and LLM tools with short-term conversation memory
  • πŸ’­ Short-Term Conversation Memory – Automatic memory system that stores the last N tool outputs per session with configurable expiration (default: 10 outputs, 15 minutes TTL). Memory is keyed by session_id (not tenant_id) for safety, enabling better context awareness in multi-step workflows. Memory is automatically injected into tool payloads and cleared on session end.
  • πŸ“š Enhanced Knowledge Base Management – Upload raw text, URLs, or documents (PDF/DOCX/TXT/MD) with rich metadata (source URL, timestamp, document type) and optimized chunking (400-600 tokens)
  • πŸ€– AI-Generated KB Metadata – Automatic extraction of title, summary, tags, topics, date, and quality score during document ingestion. LLM-powered with intelligent fallback when unavailable - uses keyword extraction and pattern matching to provide useful metadata even during timeouts
  • πŸ” Optimized RAG Search with Cross-Encoder Re-ranking – Two-stage retrieval: initial vector search followed by cross-encoder re-ranking of top candidates using cross-encoder/ms-marco-MiniLM-L-6-v2 for massive accuracy improvement. Semantic search with configurable similarity threshold (default 0.3) for better recall
  • ⚑ Per-Tool Latency Prediction – Agent estimates expected latency before choosing tools (RAG: 60-120ms, Web: 400-1800ms, Admin: <20ms) to optimize tool selection and choose the fastest path
  • 🧠 Context-Aware MCP Routing – Intelligent tool selection based on previous outputs: skip web search if RAG returns high score (β‰₯0.8), skip agent reasoning for critical admin violations, skip RAG if relevant memory already available. Leads to more sophisticated behavior and higher scores
  • πŸ“‹ Tool Output Schemas – Every tool returns strict JSON type schemas for easier debugging, cleaner reasoning, and more polished responses. Automatic schema validation and formatting
  • πŸ—‘οΈ Document Management – Delete individual documents or bulk delete all documents for a tenant with confirmation dialogs
  • πŸ›‘οΈ Enterprise Admin Governance – Advanced rule management system with:
    • Regex-based red-flag pattern matching with severity levels (low/medium/high/critical)
    • Automatic admin alerts for violations
    • LLM-Enhanced Rules: Rules are automatically analyzed and enhanced to identify edge cases, improve regex patterns, and suggest appropriate severity levels
    • LLM-Guided Rule Explanations: Automatic generation of human-readable explanations, concrete examples, and missing pattern suggestions. Includes intelligent fallback when LLM is unavailable - uses keyword extraction to provide useful explanations even during timeouts
    • File Upload Support: Upload rules from TXT, PDF, DOC, or DOCX files with drag-and-drop interface
    • Chunk Processing: Large rule sets processed in manageable chunks (5 rules at a time) to prevent timeouts
    • Rule-Based Behavior Control: Rules checked FIRST - brief response rules return quick answers, blocking rules prevent requests
    • Comment Filtering: Comment lines (starting with #) automatically ignored when uploading rules
    • Supabase Integration: Rules stored in Supabase for production scalability (with SQLite fallback)
  • πŸ“Š Comprehensive Analytics & Observability – Full tenant-level analytics logging with Supabase backend (SQLite fallback for local dev):
    • Tool usage breakdown (RAG, Web, Admin, LLM) with latency and token tracking
    • RAG recall/precision indicators (average hits, scores, top scores)
    • Per-tenant query volume and active users
    • Red-flag violations with timestamps and confidence scores
    • LLM token logs and latency metrics
    • Real-Time Visualizations: Reasoning path visualizer, tool invocation timeline, and tenant activity heatmap
  • 🌐 Live Web Search – Google Programmable Search (Custom Search API) with tenant-aware MCP tooling
  • 🏒 Multi-Tenant Isolation – Complete tenant isolation with centralized tenant ID management; backend enforces strict isolation for chat, ingestion, and admin ops
  • πŸ” Fine-Grained Role-Based Access Control (RBAC) – Four-tier role system (viewer, editor, admin, owner) with backend permission enforcement
  • πŸ”„ Intelligent Multi-Tool Orchestration – MCP agent orchestrator autonomously selects optimal tool chains (RAG + Web + LLM, etc.) based on query intent, context, latency predictions, and previous tool outputs. Context-aware routing enables sophisticated tool skipping for efficiency
  • ⚑ Robust Error Handling – Structured error responses, retry mechanisms, and graceful fallbacks (e.g., if RAG fails β†’ fallback to LLM-only)
  • πŸ“‘ Streaming Responses – Chat responses stream character-by-character using Server-Sent Events (SSE) for real-time user experience
  • 🎯 Rule-First Processing – Admin rules checked before intent classification - rules can trigger brief responses or block requests entirely
  • 🧠 Advanced Context Engineering – Implements Anthropic's context engineering strategies:
    • High-Fidelity Compaction: Automatically compresses conversations at 80% token threshold, preserving architectural decisions and unresolved issues
    • Tool Result Clearing: Safest form of compaction - removes large tool outputs while keeping metadata
    • Structured Note-Taking: Tracks objectives, architectural decisions, and unresolved issues outside context window
    • XML-Structured Prompts: All prompts use clear XML sections for better model understanding
    • Just-in-Time Context Loading: Selects only relevant memories and tools for each query
    • Progressive Disclosure: Agents discover context incrementally through exploration

Enterprise Features

  • πŸ” Regex-Based Red-Flag Detection – Support for complex regex patterns with keyword fallback and semantic scoring
  • πŸ€– LLM-Enhanced Rule Management – Rules automatically enhanced by LLM to identify edge cases, improve patterns, and suggest severity levels. Includes intelligent fallback explanations when LLM is unavailable - uses keyword extraction to generate useful explanations, examples, and pattern suggestions even during timeouts
  • πŸ“„ File Upload & Drag-and-Drop – Upload rules from files (TXT, PDF, DOC, DOCX) with intuitive drag-and-drop interface
  • ⚑ Chunk-Wise Processing – Large rule sets processed in chunks to prevent timeouts and ensure reliable processing
  • πŸ“ˆ Real-Time Analytics Dashboard – Per-tenant analytics with configurable time windows (7, 30, 90 days)
  • πŸ› οΈ Admin API Endpoints – /admin/violations, /admin/tools/logs, /admin/tenants for comprehensive governance
  • 🧠 Agent Debug & Planning – /agent/debug and /agent/plan endpoints for observability and tool selection inspection
  • πŸ’Ύ Persistent Analytics Storage – Supabase-backed analytics store (with automatic SQLite fallback) for fast, multi-tenant queries
  • πŸ—„οΈ Supabase Integration – Production-ready Supabase support for admin rules with automatic table creation
  • πŸ“ˆ Real-Time Visualization Components – Interactive visualizations for agent reasoning, tool execution, and tenant activity:
    • Reasoning Path Visualizer: Step-by-step visualization of agent decision-making with animated progression
    • Tool Invocation Timeline: Visual timeline showing tool execution order, latency, and result counts
    • Tenant Activity Heatmap: Query activity heatmap and per-tool usage trends over time

Conversation Memory System

IntegraChat includes a short-term conversation memory system that enhances multi-step workflows by maintaining context across tool calls:

  • Automatic Storage: Every tool output is automatically stored in memory for the session
  • Bounded Size: Keeps only the last N tool outputs (configurable via MCP_MEMORY_MAX_ITEMS, default: 10)
  • Auto-Expiration: Entries automatically expire after a configurable TTL (via MCP_MEMORY_TTL_SECONDS, default: 900 seconds / 15 minutes)
  • Session-Based: Memory is keyed by session_id (not tenant_id) for safety and isolation
  • Automatic Injection: Recent memory is automatically injected into tool payloads as a memory field for multi-step workflows
  • Session Clearing: Memory can be explicitly cleared by sending end_session: true or endSession: true in the payload

Usage Example:

{
  "tenant_id": "acme",
  "session_id": "chat-abc-123",
  "query": "Search for X"
}

Subsequent tool calls with the same session_id will receive a memory field containing recent tool outputs, enabling tools to make context-aware decisions in multi-step workflows.

Configuration:

  • MCP_MEMORY_MAX_ITEMS: Maximum number of tool outputs to keep per session (default: 10)
  • MCP_MEMORY_TTL_SECONDS: Time-to-live for memory entries in seconds (default: 900)

Role-Based Access Control (RBAC)

IntegraChat implements fine-grained role-based access control (RBAC) for backend API endpoints. This ensures that users can only access features appropriate for their role level.

Roles

The system supports four roles with increasing privileges:

  1. viewer (default) - Basic read-only access

    • Can use chat functionality
    • Cannot ingest documents
    • Cannot delete documents
    • Cannot view analytics
    • Cannot manage admin rules
  2. editor - Content management access

    • Can use chat functionality
    • βœ… Can ingest documents (upload, paste, URLs, files)
    • ❌ Cannot delete documents
    • ❌ Cannot view analytics
    • ❌ Cannot manage admin rules
  3. admin - Administrative access

    • Can use chat functionality
    • βœ… Can ingest documents
    • βœ… Can delete documents
    • βœ… Can view analytics
    • βœ… Can manage admin rules
  4. owner - Full system access

    • Same permissions as admin (highest privilege level)

Permission Matrix

Action viewer editor admin owner
Chat Bot βœ… βœ… βœ… βœ…
Ingest Documents ❌ βœ… βœ… βœ…
Delete Documents ❌ ❌ βœ… βœ…
View Analytics βœ… βœ… βœ… βœ…
Manage Rules ❌ ❌ βœ… βœ…

Backend RBAC

Backend API endpoints enforce RBAC through the x-user-role header:

# Permission matrix in backend/mcp_server/common/access_control.py
PERMISSIONS = {
    "manage_rules": {"owner", "admin"},
    "ingest_documents": {"owner", "admin", "editor"},
    "delete_documents": {"owner", "admin"},
    "view_analytics": {"owner", "admin"},
}

Protected Endpoints:

  • /admin/rules - Requires admin or owner role
  • /rag/ingest* - Requires editor, admin, or owner role
  • /rag/delete* - Requires admin or owner role
  • /analytics/* - All roles can view (viewer, editor, admin, owner)

Role Propagation: The user role is automatically propagated through the entire request pipeline:

  1. Client sends x-user-role header
  2. Backend API route receives and validates role
  3. Role is passed to service layer (process_ingestion(), etc.)
  4. Service layer passes role to MCP clients
  5. MCP clients include role in payload to MCP server
  6. MCP server extracts role and enforces permissions

Example Request:

curl -X POST "http://localhost:8000/admin/rules" \
  -H "Content-Type: application/json" \
  -H "x-tenant-id: tenant123" \
  -H "x-user-role: admin" \
  -d '{"rule": "Do not share passwords"}'

If the role lacks permission, the API returns 403 Forbidden with a descriptive error message that includes:

  • Which role was used
  • Which roles are allowed for the action
  • Instructions to change role in the UI

Using RBAC

  1. Set Role: Include x-user-role header in API requests with one of: viewer, editor, admin, or owner
  2. Verify Permissions: Backend enforces role-based access automatically
  3. Error Handling: API returns 403 Forbidden with clear error messages when role lacks required permissions

Real-Time Visualization Features

IntegraChat includes three powerful visualization components that provide real-time insights into agent behavior and system activity:

1. Reasoning Path Visualizer

  • What it shows: Step-by-step visualization of how the agent makes decisions
  • Features:
    • Animated progression through reasoning steps
    • Status indicators (pending, running, completed, error)
    • Detailed metrics per step (latency, hit counts, token estimates)
    • Visual icons for each step type
  • Where to find it:
    • Gradio app: Debug & Reasoning tab
  • Data source: reasoning_trace from agent responses

2. Tool Invocation Timeline

  • What it shows: Visual timeline of all tool executions during an agent interaction
  • Features:
    • Color-coded bars showing tool status (success/error)
    • Latency visualization per tool
    • Result count badges
    • Summary statistics (total tools, total time, average latency)
  • Where to find it:
    • Gradio app: Debug & Reasoning tab
  • Data source: tool_traces from agent responses

3. Tenant Activity Heatmap

  • What it shows: Query activity patterns and tool usage trends over time
  • Features:
    • Hour-by-hour, day-by-day activity heatmap
    • Color intensity based on activity level
    • Per-tool usage trends with bar charts
    • Trend indicators (up/down/stable)
  • Where to find it:
    • Gradio app: Admin Analytics tab
    • Configurable time window (default: 7 days)
  • Data source: /analytics/activity and /analytics/tool-usage endpoints

Access: All visualization features are available to all roles (viewer, editor, admin, owner).


Installation & Setup

Prerequisites

  • Python 3.10+ with pip
  • PostgreSQL (with pgvector extension) or Supabase for RAG storage
  • Supabase (recommended) or SQLite for admin rules and analytics
  • Ollama (local) or Groq API credentials for LLM
  • Google Custom Search API (optional, for web search):
    • Enable Custom Search API in Google Cloud Console
    • Create API key β†’ set as GOOGLE_SEARCH_API_KEY in .env
    • Create Programmable Search Engine β†’ set ID as GOOGLE_SEARCH_CX_ID in .env

Step-by-Step Installation

  1. Clone and navigate to the project:

    cd IntegraChat
    
  2. Create and activate virtual environment (recommended):

    # Windows
    python -m venv venv
    venv\Scripts\activate
    
    # Linux/Mac
    python3 -m venv venv
    source venv/bin/activate
    
  3. Install Python dependencies:

    pip install -r requirements.txt
    
  4. Configure environment variables:

    cp env.example .env
    # Edit .env with your credentials:
    # - SUPABASE_URL and SUPABASE_SERVICE_KEY (for production storage)
    # - POSTGRESQL_URL (for RAG vector database)
    # - OLLAMA_URL/OLLAMA_MODEL or GROQ_API_KEY (for LLM)
    # - GOOGLE_SEARCH_API_KEY and GOOGLE_SEARCH_CX_ID (optional, for web search)
    
  5. Set up Supabase (recommended for production):

    • Create a Supabase project at supabase.com
    • Run supabase_admin_rules_table.sql in Supabase SQL Editor
    • Run supabase_analytics_tables.sql in Supabase SQL Editor
    • Copy your project URL and service role key to .env
    • Verify setup: python verify_supabase_setup.py
  6. Start the services:

    Option A: Windows Quick Start (recommended for Windows):

    start.bat
    

    This automatically starts:

    • FastAPI backend on port 8000
    • Unified MCP server on port 8900

    Option B: Manual Start:

    # Terminal 1: FastAPI backend
    uvicorn backend.api.main:app --port 8000 --reload
    
    # Terminal 2: Unified MCP server
    python backend/mcp_server/server.py
    
  7. Launch the UI:

    Gradio Interface (full-featured):

    python app.py
    

    Access at http://localhost:7860

Usage

Gradio Interface (app.py)

The Gradio UI provides a comprehensive interface with five main tabs:

1. Chat πŸ’¬

  • Enter your Tenant ID and start chatting with the MCP-powered agent
  • Real-time streaming responses (word-by-word using SSE)
  • Autonomous tool orchestration (RAG, Web, Admin, LLM)
  • Multi-step planning with memory of previous tool outputs

2. Document Ingestion πŸ“š

  • Raw Text: Paste text directly
  • URL: Ingest content from web URLs
  • File Upload: Upload PDF, DOCX, TXT, or Markdown files
  • Rich metadata support (filename, URL, document ID, custom JSON)
  • View and manage ingested documents

3. Knowledge Base Library πŸ“–

  • Statistics Dashboard: Visual cards showing document counts by type
  • Interactive Charts: Plotly pie chart for document type distribution
  • Semantic Search: Search knowledge base with relevance scoring
  • Type Filtering: Filter by document type (text, PDF, FAQ, link)
  • Document Management: View, preview, and delete documents
  • Auto-refresh: Lists update automatically after operations

4. Admin Analytics πŸ“Š

  • Statistics Cards: Total queries, active users, red flags, RAG searches
  • Interactive Bar Charts:
    • Tool Usage Count (RAG, Web, Admin, LLM)
    • Average Tool Latency (performance metrics)
    • RAG Quality Metrics (hits, scores, recall indicators)
  • Tool Usage Table: Detailed performance breakdown
  • Formatted Summary: Key metrics in easy-to-read format
  • Click "πŸ”„ Fetch Analytics Snapshot" to load latest data

5. Admin Rules & Compliance πŸ›‘οΈ

  • Text Input: Paste rules one per line (comments starting with # are ignored)
  • File Upload: Upload rules from TXT, PDF, DOC, or DOCX files
  • LLM Enhancement: Automatic rule enhancement (edge cases, pattern improvements, severity suggestions)
  • Chunk Processing: Large rule sets processed in chunks (5 at a time)
  • Rule-Based Behavior: Rules checked FIRST - brief responses or blocking based on severity
  • Streaming Responses: Real-time word-by-word streaming
  • Refresh Button: Update rules table directly

πŸ’‘ Tip: Every action requires a Tenant ID. The Tenant ID persists across page refreshes and is managed centrally.


API Endpoints

All endpoints are served by the FastAPI backend at http://localhost:8000. Most endpoints require the x-tenant-id header for tenant isolation.

πŸ“– API Documentation: Interactive Swagger docs available at http://localhost:8000/docs when the backend is running.

Agent Endpoints

Method Endpoint Description
POST /agent/message Main chat endpoint with tenant_id, message, optional history
POST /agent/message/stream Streaming chat endpoint using Server-Sent Events (SSE). Returns tokens word-by-word
POST /agent/debug Detailed debugging info: reasoning trace, tool selection, intent classification
POST /agent/plan Tool selection plan without execution (intent, tool scores, planned steps)

RAG Endpoints

Method Endpoint Description
POST /rag/ingest-document Ingest document with source_type, content, metadata. Supports raw text, URLs, PDFs, DOCX, TXT, Markdown
POST /rag/ingest-file Multipart file upload (PDF/DOCX/TXT/MD) with x-tenant-id header
GET /rag/list?tenant_id={id}&limit={n}&offset={n} List all documents for a tenant with pagination
DELETE /rag/delete/{document_id}?tenant_id={id} Delete a specific document by ID
DELETE /rag/delete-all?tenant_id={id} Delete all documents for a tenant

Note: RAG endpoints support both x-tenant-id header and tenant_id query parameter.

Admin & Governance Endpoints

Method Endpoint Description
GET /admin/rules?detailed=true Get all rules (use detailed=true for regex/severity metadata)
POST /admin/rules?enhance=true Add single rule with optional pattern (regex), severity, description. Set enhance=true for LLM enhancement
POST /admin/rules/bulk?enhance=true Add multiple rules at once (processed in chunks of 5). LLM enhancement applied automatically
POST /admin/rules/upload-file?enhance=true Upload rules from file (TXT, PDF, DOC, DOCX). Text extracted server-side
DELETE /admin/rules/{rule} Delete a specific rule
GET /admin/violations?days=30&limit=50 Get red-flag violations with timestamps and confidence scores
GET /admin/tools/logs?tool_name=rag&days=7 Get detailed tool usage logs with latency and token counts
GET/POST/DELETE /admin/tenants Tenant management endpoints
POST /admin/setup/table Create admin_rules table in Supabase if it doesn't exist

Analytics Endpoints

Method Endpoint Description
GET /analytics/overview?days=30 Comprehensive analytics: total queries, tool usage, red-flag count, RAG quality
GET /analytics/tool-usage?days=30 Detailed tool usage stats: counts, latency, tokens, success/error rates
GET /analytics/redflags?limit=50&days=30 Recent red-flag violations for tenant
GET /analytics/activity?days=30 Tenant activity summary: queries, active users, last query timestamp
GET /analytics/rag-quality?days=30 RAG quality metrics: avg hits, scores, latency (recall/precision indicators)

Visualization Features

IntegraChat includes three powerful visualization components that provide real-time insights into agent behavior and system activity:

1. Real-Time Reasoning Visualizer

  • Location: Debug tab (Gradio app)
  • Features:
    • Step-by-step visualization of agent reasoning path
    • Animated progression through reasoning steps
    • Status indicators (pending, running, completed, error)
    • Detailed metrics per step (latency, hit counts, token estimates)
    • Visual icons for each step type (admin rules check, RAG prefetch, tool selection, etc.)
  • Data Source: reasoning_trace from /agent/message or /agent/debug endpoints
  • Usage: Automatically appears in chat panel when agent responses include reasoning traces

2. Tool Invocation Timeline

  • Location: Debug tab (Gradio app)
  • Features:
    • Visual timeline showing tool execution order
    • Color-coded bars indicating tool status (success/error)
    • Latency visualization per tool
    • Result count badges
    • Summary statistics (total tools, total time, average latency)
  • Data Source: tool_traces from /agent/message or /agent/debug endpoints
  • Usage: Automatically appears in chat panel when agent responses include tool traces

3. Live Tenant Heatmap

  • Location: Analytics page (/analytics)
  • Features:
    • Query activity heatmap (hour-by-hour, day-by-day visualization)
    • Color intensity based on activity level
    • Per-tool usage trends with bar charts
    • Trend indicators (up/down/stable)
    • Configurable time window (default: 7 days)
  • Data Source: /analytics/activity and /analytics/tool-usage endpoints
  • Usage: Navigate to Analytics page to view tenant activity patterns

Access: All visualization features are available to all roles (viewer, editor, admin, owner).

Request Headers

Most endpoints require:

  • x-tenant-id: Tenant identifier for multi-tenant isolation
  • x-user-role: Caller role for RBAC enforcement (viewer, editor, admin, or owner)
    • Important: Role must be passed through the entire pipeline (UI β†’ API β†’ RAG Client β†’ MCP Server)
    • Role is automatically propagated from the API request to backend API, then to RAG client, and finally to MCP server for permission checks
    • If ingestion fails with permission errors, verify the role is set correctly in the UI and check backend logs for role propagation debug messages
  • Content-Type: application/json: For POST requests with JSON payloads

Example Request

curl -X POST http://localhost:8000/agent/message \
  -H "Content-Type: application/json" \
  -H "x-tenant-id: tenant123" \
  -d '{
    "message": "What is our refund policy?",
    "tenant_id": "tenant123"
  }'

Architecture

System Overview

IntegraChat follows a modular architecture with clear separation of concerns:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend UI   β”‚  (Gradio)
β”‚    Port 7860    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  FastAPI Backendβ”‚  (API Gateway)
β”‚    Port 8000    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”œβ”€β”€β–Ί Unified MCP Server (Port 8900)
         β”‚    β”œβ”€β”€ RAG Tools (search, ingest, list, delete)
         β”‚    β”œβ”€β”€ Web Tools (search)
         β”‚    └── Admin Tools (rules, violations)
         β”‚
         β”œβ”€β”€β–Ί PostgreSQL/Supabase (RAG Vector Store)
         β”œβ”€β”€β–Ί Supabase/SQLite (Rules & Analytics)
         └──► LLM Backend (Ollama/Groq)

Enterprise-Grade Features

  1. Autonomous Multi-Step Planning: LLM-powered planning determines optimal tool sequences with short-term conversation memory that stores and injects previous tool outputs into subsequent tool calls for better context awareness.

  2. Regex-Based Governance: Admin rules support regex patterns with fallback to keyword matching and semantic similarity scoring for flexible policy enforcement.

  3. Comprehensive Analytics: All tool usage, RAG searches, LLM calls, and red-flag violations are logged with indexed queries for fast analytics retrieval.

  4. Enhanced RAG Pipeline: Documents chunked optimally (400-600 tokens) and enriched with metadata (source URL, timestamp, document type) for better retrieval.

  5. Structured Error Handling: All errors logged with context, with graceful fallbacks (e.g., RAG fails β†’ LLM-only, web fails β†’ skip web).

Data Storage Architecture

IntegraChat uses dual-backend storage with automatic fallback for production flexibility:

Supabase (Production/Preferred)

When to use: Production deployments, multi-user environments, scalable applications

Storage:

  • admin_rules - Admin rules with regex patterns and severity levels
  • tool_usage_events - Tool invocation logs with latency and token tracking
  • redflag_violations - Red-flag violation events with timestamps
  • rag_search_events - RAG search metrics and quality indicators
  • agent_query_events - Agent query logs and analytics

Features:

  • Row Level Security (RLS) for multi-tenant isolation
  • Automatic backups and scaling
  • Real-time capabilities
  • Production-ready infrastructure

Setup: Configure SUPABASE_URL and SUPABASE_SERVICE_KEY in .env

SQLite (Development Fallback)

When to use: Local development, testing, single-user scenarios

Storage:

  • data/admin_rules.db - Admin rules (local file)
  • data/analytics.db - Analytics events (local file)

Features:

  • Zero configuration required
  • Perfect for local development
  • Automatic fallback when Supabase not configured

Migration: To migrate existing SQLite data to Supabase, refer to Supabase documentation for data migration strategies.


Supabase Setup & Migration

IntegraChat supports Supabase for production-ready storage of admin rules and analytics. Both RulesStore and AnalyticsStore automatically detect and use Supabase when credentials are available, falling back to SQLite for local development.

Quick Setup

  1. Create Supabase tables:

    • Run supabase_admin_rules_table.sql in Supabase SQL Editor
    • Run supabase_analytics_tables.sql in Supabase SQL Editor
  2. Configure environment variables in .env:

    SUPABASE_URL=https://your-project-id.supabase.co
    SUPABASE_SERVICE_KEY=your_service_role_key_here
    
  3. Verify setup: Check that your Supabase project is accessible and tables are created correctly.


Troubleshooting

Common Issues

Backend Not Starting

  • Issue: FastAPI backend fails to start
  • Solution:
    • Check if port 8000 is already in use: netstat -ano | findstr :8000 (Windows) or lsof -i :8000 (Linux/Mac)
    • Verify Python virtual environment is activated
    • Check .env file exists and has required variables
    • Review error logs for missing dependencies

MCP Server Connection Errors

  • Issue: "Could not connect to MCP server" errors
  • Solution:
    • Ensure unified MCP server is running: python backend/mcp_server/server.py
    • Check MCP server is on port 8900 (default)
    • Verify MCP_SERVER_ID in .env matches server configuration
    • Check firewall settings if running on different machines

RAG Search Not Returning Results

  • Issue: RAG searches return no results despite ingested documents
  • Solution:
    • Check similarity threshold (default 0.3) - try lowering to 0.2 or 0.1
    • Verify documents exist: GET /rag/list?tenant_id={id}
    • Ensure tenant_id matches between ingestion and search
    • Check PostgreSQL/pgvector connection and vector extension
    • Review MCP server logs for search metrics

Supabase Configuration Issues

  • Issue: Data still going to SQLite instead of Supabase
  • Solution:
    • Verify SUPABASE_URL and SUPABASE_SERVICE_KEY in .env (no quotes, no spaces)
    • Use service_role key (not anon key) from Supabase Dashboard
    • Verify Supabase credentials in .env file
    • Ensure tables exist: run SQL scripts in Supabase SQL Editor
    • Check FastAPI startup logs for backend detection messages

LLM Connection Errors

  • Issue: Agent responses fail with LLM errors
  • Solution:
    • For Ollama: Ensure Ollama is running (ollama serve)
    • Check OLLAMA_URL and OLLAMA_MODEL in .env
    • For Groq: Verify GROQ_API_KEY is set correctly
    • Check LLM_BACKEND setting (ollama or groq)
    • Test LLM connection: curl http://localhost:11434/api/tags (Ollama)

Document Ingestion Failures

  • Issue: File uploads or document ingestion fails
  • Solution:
    • Check file size limits (default may be 10MB)
    • Verify file format is supported (PDF, DOCX, TXT, MD)
    • Ensure tenant_id is provided in request
    • Check user role: Ingestion requires editor, admin, or owner role. If you see "Permission Denied (403)", change your role in the UI dropdown (top right) from "viewer" to "editor", "admin", or "owner"
    • Verify x-user-role header is being sent correctly (check backend logs for debug messages)
    • Check backend logs for specific error messages
    • Verify PostgreSQL connection for RAG storage

Document Display Issues

  • Issue: Document list shows [object Object] instead of document details
  • Solution: This has been fixed. Documents now display properly with:
    • Document ID (number)
    • Document Type (text, pdf, faq, link)
    • Preview (first 200 characters)
    • Length (character count)
    • Created date
  • If still seeing issues: Refresh the Knowledge Base Library tab

Rule Addition Timeouts

  • Issue: "Chunk 1/1 timed out after 45s" when adding rules
  • Solution:
    • Quick Fix: Uncheck the "Enable LLM Enhancement" checkbox before adding rules - rules will be added immediately without LLM processing
    • With Enhancement: Keep checkbox checked but be patient - enhancement can take up to 180s for 5 rules (30s per rule)
    • Best Practice: Add rules in smaller batches (1-3 rules at a time) when using enhancement
  • Note: Enhancement is optional - you can always add rules quickly without it, then enhance them later if needed

Rule Deletion Issues

  • Issue: "404 Not Found" when trying to delete a rule
  • Solution: You can now delete rules in two ways:
    • By Number: Enter the rule number (e.g., "1", "2", "3") as shown in the rules table
    • By Text: Enter the exact rule text as displayed in the rules table
  • If rule not found: Make sure you're entering the exact text or a valid rule number. Refresh the rules table to see current rules.

Tenant Isolation Issues

  • Issue: Documents or data leaking between tenants
  • Solution:
    • Check database queries include WHERE tenant_id = ... filters
    • Verify tenant ID normalization is working correctly
    • Review database logs for tenant isolation

Getting Help

  1. Check Logs: Review FastAPI and MCP server logs for detailed error messages
  2. Run Diagnostics: Use helper scripts in the Testing & Diagnostics section
  3. Verify Configuration: Check .env file and Supabase connection
  4. Review Documentation: See backend/README.md for backend-specific issues

Testing & Diagnostics

You can test the system by:

  • API Testing: Use the FastAPI interactive docs at http://localhost:8000/docs to test endpoints
  • Database Inspection: Connect directly to your PostgreSQL/Supabase instance to verify tenant isolation
  • Log Monitoring: Check FastAPI and MCP server logs for detailed error messages and debugging information

Tip: Ensure the Python virtual environment is active (source venv/bin/activate or .\venv\Scripts\activate) and that .env contains the MCP server URLs/LLM settings.


Demo Video

  • βœ… Prerequisites: FastAPI backend plus all MCP servers (RAG/Web/Admin) running locally.

  • βœ… What it checks:

    1. Direct database writes via the analytics and rules stores
    2. CRUD over the /admin/* and /analytics/* endpoints
    3. RAG ingestion and isolation by issuing queries as multiple tenants and ensuring secrets never leak across IDs
  • βœ… Pass criteria: At least 80β€―% of the sub-tests succeed (the RAG isolation test must pass for overall success).

  • python check_rag_database.py
    Provides a low-level inspection of the RAG datastore. It connects straight to the pgvector/Postgres instance, lists all tenant IDs, prints sample chunks, and runs search_vectors() directly to ensure the SQL WHERE tenant_id = … filter is behaving as expected. Use this script when diagnosing suspected cross-tenant leakage or when seeding demo data.

  • python verify_supabase_setup.py
    Verifies Supabase configuration and shows which backend (Supabase or SQLite) each store is using. Displays any missing configuration and provides a summary of where data will be saved.

  • python check_supabase_rules.py
    Checks Supabase admin rules configuration and RLS policies. Validates that rules can be read/written correctly.

  • python migrate_sqlite_to_supabase.py
    One-shot migration script that copies existing SQLite data (admin rules + analytics) to Supabase. Supports both PostgreSQL direct connection and Supabase REST API methods.

  • python test_manual.py
    The existing manual test runner remains useful for smoke-testing analytics logging, admin rule CRUD, and API response codes. Run it whenever you adjust schemas or update MCP endpoints.

Tip: Ensure the Python virtual environment is active (source venv/bin/activate or .\venv\Scripts\activate) and that .env contains the MCP server URLs/LLM settings.


Demo Video

πŸŽ₯ [Demo Video Placeholder] - Coming soon!

Watch how IntegraChat uses MCP to power autonomous agents with multi-tool selection, RAG retrieval, and enterprise governance.


Social Media

πŸ“± [Social Media Post Placeholder] - Coming soon!

Follow us for updates and demos of IntegraChat in action!


Team Member(s)

  • Your Name Here - Developer & MCP Enthusiast

License

This project is licensed under the MIT License - see the LICENSE file for details.


Technical Stack

Backend

  • Framework: FastAPI with async/await for high-performance MCP orchestration
  • MCP Server: Unified MCP server (port 8900) exposing all tools via namespaces
  • API: RESTful API with Server-Sent Events (SSE) for streaming responses
  • LLM Integration:
    • Ollama (local, default) - http://localhost:11434
    • Groq (cloud) - via API key
    • Configurable backend with streaming support

Frontend

  • Gradio UI: Full-featured interface with Plotly visualizations (app.py)
  • UI Libraries:
    • Plotly for interactive charts and visualizations

Data Storage

  • RAG Vector Store: PostgreSQL with pgvector extension (via Supabase or direct connection)
  • Analytics: Supabase (production) or SQLite (development) with indexed queries
  • Rules Storage: Supabase (production) or SQLite (development) with automatic fallback
  • Database: PostgreSQL for RAG embeddings, Supabase/SQLite for analytics and rules

File Processing

  • Supported Formats: TXT, PDF, DOC, DOCX, Markdown
  • Libraries: PyPDF2, python-docx for server-side text extraction
  • Metadata: Rich metadata support (source URL, timestamp, document type)

Communication

  • Streaming: Server-Sent Events (SSE) for real-time word-by-word response streaming
  • Protocol: Model Context Protocol (MCP) for tool communication
  • HTTP: RESTful endpoints with JSON payloads

Recent Enhancements

UI & UX Improvements (Latest)

  • Document Display Fix: Fixed document list showing [object Object] - now properly displays document ID, type, preview, length, and creation date in a formatted table
  • Rule Deletion Enhancement: Can now delete rules by entering either:
    • Rule number (e.g., "1", "2", "3") - automatically finds the corresponding rule
    • Full rule text - deletes the exact matching rule
  • LLM Enhancement Toggle: Added checkbox to enable/disable LLM enhancement when adding rules:
    • Quick Add: Uncheck to add rules immediately without LLM processing (no timeout issues)
    • Enhanced Add: Check to get better patterns, explanations, and examples (takes longer but higher quality)
  • Improved Timeouts: Increased timeout for rule enhancement from 45s to 180s to handle multiple rules properly
  • Better Error Messages: Clearer error messages for rule deletion, document operations, and permission errors

Role Propagation & Permission Handling (Latest)

  • Fixed Role Propagation: User role (viewer, editor, admin, owner) is now properly passed through the entire ingestion pipeline:
    • UI sends role in x-user-role header
    • Backend API route receives and validates role
    • Role is passed to process_ingestion() service
    • RAG client includes role in payload to MCP server
    • MCP server uses role for permission checks
  • Improved Error Handling: Permission errors (403 Forbidden) now return clear, actionable error messages:
    • Clear indication when role lacks required permissions
    • Guidance on which roles can perform specific actions
    • Instructions to change role in UI dropdown
  • Debug Logging: Added comprehensive debug logging to trace role values through the pipeline for troubleshooting
  • Admin Question Handling: Fixed "who is the admin" type questions to use RAG from knowledge base instead of generic LLM responses

Admin Rules System (Latest)

  • File Upload Support: Upload rules from TXT, PDF, DOC, DOCX files with drag-and-drop interface
  • LLM Enhancement Toggle: Optional LLM enhancement with checkbox control:
    • Quick Add Mode: Uncheck to add rules immediately without LLM processing (no timeouts)
    • Enhanced Mode: Check to get better patterns, explanations, examples, and edge case detection
  • LLM Enhancement: When enabled, automatic rule enhancement identifies edge cases, improves regex patterns, and suggests severity levels
  • Intelligent Fallback Explanations: When LLM enhancement times out or fails, the system automatically generates basic explanations using keyword extraction, providing useful examples and pattern suggestions without requiring LLM availability
  • Chunk Processing: Large rule sets processed in chunks of 5 to prevent timeouts (handles 100+ rules efficiently)
  • Enhanced Timeouts: Increased timeout from 45s to 180s per chunk to accommodate LLM processing
  • Flexible Rule Deletion: Delete rules by entering either rule number (e.g., "1") or full rule text
  • Comment Filtering: Comment lines (starting with #) automatically ignored when uploading rules
  • Rule-First Processing: Admin rules checked before intent classification - enables behavior control (brief responses vs blocking)
  • Supabase Integration: Production-ready Supabase support with automatic table creation
  • Streaming Responses: Word-by-word streaming for chat responses using Server-Sent Events (SSE)

Conversation Memory System (Latest)

  • Short-Term Memory: Automatic storage of tool outputs per session with configurable size limits and TTL
  • Session-Based Isolation: Memory keyed by session_id (not tenant_id) for safety
  • Automatic Injection: Recent memory automatically injected into tool payloads for multi-step workflows
  • Auto-Expiration: Memory entries expire after configurable TTL (default: 15 minutes)
  • Session Management: Memory can be explicitly cleared via end_session flag
  • Comprehensive Testing: Full test suite covering memory storage, retrieval, expiration, and multi-step workflows

AI-Generated KB Metadata & Advanced RAG (Latest)

  • Automatic Metadata Extraction: When ingesting documents, system auto-extracts:
    • Title: From filename, URL, or content structure (with intelligent fallback)
    • Summary: 2-3 sentence summary via LLM (with keyword-based fallback)
    • Tags: 5-8 relevant tags extracted from content
    • Topics: 3-5 main themes identified via LLM
    • Date Detection: Multiple date formats automatically detected
    • Quality Score: 0.0-1.0 score based on structure and completeness
  • Intelligent Fallback: When LLM is unavailable or times out, uses keyword extraction and pattern matching to provide useful metadata
  • Database Integration: Metadata stored in JSONB column for flexible querying and enhanced RAG search
  • Migration Script: Safe, idempotent database migration script included

Per-Tool Latency Prediction & Context-Aware Routing (Latest)

  • Latency Prediction: Agent estimates expected latency before tool selection:
    • RAG: 60-120ms (depends on result count)
    • Web: 400-1800ms (network-dependent)
    • Admin: <20ms (local regex matching)
    • LLM: Variable based on model and token count
  • Path Optimization: Agent chooses fastest tool sequence based on latency estimates
  • Context-Aware Routing: Intelligent tool skipping based on previous outputs:
    • High RAG score (β‰₯0.8) β†’ Skip web search
    • Critical admin violation β†’ Skip agent reasoning, immediate block
    • Relevant memory available β†’ Skip RAG, use memory instead
  • Routing Hints: Context hints included in reasoning trace for transparency
  • Performance Impact: Leads to more sophisticated behavior and higher scores

Tool Output Schemas (Latest)

  • Strict JSON Schemas: Every tool returns validated JSON with consistent structure:
    • RAG: {results: [...], top_score: float, latency_ms: int}
    • Web: {results: [...], latency_ms: int}
    • Admin: {violations: [...], severity: str, latency_ms: int}
    • LLM: {text: str, tokens_used: int, latency_ms: int}
  • Automatic Validation: All tool outputs validated and formatted before use
  • Easier Debugging: Consistent structure makes debugging and monitoring simpler
  • Polished Responses: Schema-validated outputs ensure professional appearance

Cross-Encoder Re-ranking (Latest)

  • Two-Stage RAG Process:
    • Initial vector search retrieves candidates
    • Cross-encoder re-ranks top 10 results for accuracy
    • Final filtering by threshold and limit
  • Model: Uses cross-encoder/ms-marco-MiniLM-L-6-v2 (very fast, production-ready)
  • Massive Accuracy Improvement: Re-ranking significantly improves relevance of search results
  • Seamless Integration: Works transparently with existing RAG search API

Context Engineering (Latest)

  • Anthropic-Inspired Strategies: Implements best practices from Anthropic's context engineering research:
    • Compaction: High-fidelity summarization preserving architectural decisions, unresolved issues, and implementation details
    • Tool Result Clearing: Safest form of compaction - removes large tool outputs once processed
    • Structured Note-Taking: Tracks objectives (like Claude playing PokΓ©mon), architectural decisions, and unresolved issues
    • XML-Structured Prompts: All prompts use clear XML sections (<system>, <background_information>, <instructions>) for better model understanding
    • Automatic Compression: Conversations compressed at 80% token threshold, targeting 60% after compression
    • Just-in-Time Context: Selects only relevant memories and tools for each query
    • Progressive Disclosure: Agents discover context incrementally through exploration
  • Benefits:
    • Reduced token usage and costs
    • Longer conversation support
    • Better agent coherence across extended interactions
    • Improved performance through structured context
  • Documentation: Context engineering features are integrated throughout the agent orchestrator and MCP server

UI Improvements

  • Modern Drag-and-Drop: Intuitive file upload with visual feedback
  • Enhanced Status Messages: Clear success/error messages with icons
  • Refresh Button in Table: Quick refresh directly from the Rule Set section
  • Better Visual Hierarchy: Improved spacing, colors, and layout
  • Gradio UI Enhancements:
    • AI metadata displayed after document ingestion
    • Latency predictions shown in reasoning trace
    • Context-aware routing hints visualized
    • Tool output schemas displayed in debug view

Key Technical Features

Tenant Isolation & Normalization

  • Strict tenant isolation enforced at database level with WHERE tenant_id = ... filters
  • Automatic tenant ID normalization handles whitespace and formatting differences
  • Documents can be listed and deleted consistently across different tenant_id formats
  • All operations validate tenant ownership before execution

RAG Search & Retrieval

  • Cross-Encoder Re-ranking: Two-stage retrieval process for massive accuracy improvement:
    • First: Vector search retrieves top candidates using embeddings
    • Then: Cross-encoder model (cross-encoder/ms-marco-MiniLM-L-6-v2) re-ranks top 10 results
    • Final: Results filtered by threshold and limit applied
  • Optimized similarity threshold (default 0.3) for better recall of relevant documents
  • Intelligent fallback returns top result even if below threshold to ensure knowledge base content is accessible
  • Pattern-based tool selection automatically triggers RAG for admin questions, fact lookups, and internal knowledge queries
  • Response unwrapping ensures seamless integration between MCP server and orchestrator

MCP Server Architecture

  • Unified server running on a single port (default 8900) for all namespaced tools
  • Dual protocol support: Both MCP protocol (POST with JSON) and RESTful HTTP (GET/DELETE)
  • Response wrapping: Standardized response format with automatic unwrapping in clients
  • Error handling: Comprehensive error responses with detailed messages for debugging

UI Features

Knowledge Base Library

  • Visual Statistics: Real-time document counts and type distribution
  • Interactive Charts: Plotly pie charts for document type visualization
  • Advanced Search: Semantic search across all ingested documents with relevance scoring
  • Smart Filtering: Filter by document type (text, PDF, FAQ, link)
  • Bulk Operations: Delete individual documents or all documents at once
  • Auto-refresh: Lists automatically update after operations

Admin Analytics Dashboard

  • Statistics Cards: Key metrics displayed in visually appealing cards with icons
  • Tool Usage Visualization: Bar charts showing tool invocation counts and performance
  • Latency Metrics: Visual representation of tool response times
  • RAG Quality Analysis: Charts displaying search quality metrics (hits, scores, recall)
  • Detailed Tables: Comprehensive tool usage breakdown with success/error rates
  • Dark Theme: Modern UI with dark background and white text for better readability
  • Real-time Updates: Fetch latest analytics data with a single click

Acknowledgments


Made with ❀️ for the MCP Hackathon

IntegraChat: Enterprise-Grade MCP Autonomous Agent Platform

⬆ Back to Top