Spaces:

nothingworry
/

IntegraChat

Sleeping

App Files Files Community

IntegraChat / backend /README.md

nothingworry

update the readme file

59466bb 26 days ago

preview code

raw

history blame

23.2 kB

Backend Documentation

This folder contains the production-ready FastAPI stack plus the companion MCP servers that power IntegraChat.

Directory Overview

api/ – FastAPI application (routes, services, storage helpers, MCP clients)
mcp_server/ – Unified MCP server exposing rag/web/admin tools via namespaces
workers/ – Celery workers and schedulers for async ingestion + analytics maintenance

Prerequisites

Python 3.10+
PostgreSQL (with the vector extension) for RAG data, or Supabase with pgvector enabled
Supabase (recommended) for admin rules + analytics storage, with automatic SQLite fallback in data/
- Both RulesStore and AnalyticsStore automatically detect and use Supabase when configured
- Falls back to SQLite automatically if Supabase credentials are missing
- See SUPABASE_SETUP.md in the root directory for setup instructions
Optional: Ollama running locally (default) or Groq API credentials for remote LLMs

Create a virtual environment at the repo root, then:

pip install -r requirements.txt
cp env.example .env   # update MCP URLs + LLM settings

Running the Services Locally

FastAPI core

uvicorn backend.api.main:app --port 8000 --reload

Unified MCP server (rag/web/admin)
```
python backend/mcp_server/server.py
```
Or use the provided startup script:
```
start.bat  # Windows - launches MCP server on port 8900 and FastAPI on port 8000
```
This single server (default port 8900) exposes the following namespaced tools:
- rag.search - Semantic search across tenant documents
- rag.ingest - Ingest text content into knowledge base
- rag.delete - Delete individual or all documents for a tenant
- rag.list - List all documents for a tenant with pagination
- web.search - Google Programmable Search (Custom Search API) web search
- admin.getRules, admin.addRule, admin.deleteRule, admin.logViolation
HTTP Endpoints (for direct API access):
- GET /rag/list?tenant_id={id}&limit={n}&offset={n} - List documents
- POST /rag/ingest - Ingest content
- POST /rag/search - Search documents (supports threshold parameter, default: 0.3)
- DELETE /rag/delete/{document_id}?tenant_id={id} - Delete specific document
- DELETE /rag/delete-all?tenant_id={id} - Delete all documents
- POST /web/search - Web search
- POST /admin/* - Admin operations

Optional workers (if running Celery-based ingestion/analytics jobs):

celery -A backend.workers.ingestion_worker worker --loglevel=info
celery -A backend.workers.analytics_worker worker --loglevel=info

The Gradio UI (python app.py) and the Next.js operator console (see frontend/README.md) both talk to the FastAPI layer at http://localhost:8000.

Key Endpoints

All endpoints require the x-tenant-id header unless otherwise noted.

Service	Path	Notes
Agent	`POST /agent/message`	Autonomous orchestration (RAG/Web/Admin/LLM)
Agent Debug	`POST /agent/debug`	Full reasoning trace + tool plan
Agent Plan	`POST /agent/plan`	Dry-run planning without executing tools
RAG	`POST /rag/ingest-document`	Rich ingestion (text, URL, metadata)
RAG	`POST /rag/ingest-file`	File upload (PDF/DOCX/TXT/MD)
RAG	`GET /rag/list`	Paginated document listing per tenant (requires `x-tenant-id` header)
RAG	`DELETE /rag/delete/{document_id}`	Delete specific document (requires `x-tenant-id` header)
RAG	`DELETE /rag/delete-all`	Delete all documents for tenant (requires `x-tenant-id` header)
Admin	`POST /admin/rules`	Regex + severity rule ingestion
Analytics	`GET /analytics/overview`	Summary metrics (queries, tokens, red flags)

Refer to the root README.md for the complete endpoint tables.

Diagnostics & Tenant Isolation

Use the helper scripts in the repo root when validating backend changes:

python verify_tenant_isolation.py – Exercises analytics logging, admin rule CRUD, API reachability, and proves RAG tenant isolation by ingesting + querying as multiple tenants.
python check_rag_database.py – Talks directly to the pgvector database to list tenant IDs, preview stored chunks, and run safeguarded searches via search_vectors(). Helpful when troubleshooting suspected cross-tenant leakage.
python verify_supabase_setup.py – Verifies Supabase configuration and shows which backend (Supabase or SQLite) each store is using.
python check_supabase_rules.py – Checks Supabase admin rules configuration and RLS policies.
python migrate_sqlite_to_supabase.py – One-shot migration script to copy existing SQLite data to Supabase.
python test_manual.py – Legacy manual smoke test harness (analytics store, admin rules, API surface).

Troubleshooting tip: If the isolation script reports a failure, first run check_rag_database.py to confirm documents are tagged with the correct tenant_id, then restart the unified MCP server so it reloads the updated SQL filtering logic.

Recent Improvements

Tenant ID Normalization

All database operations now normalize tenant IDs to handle whitespace and formatting differences
Documents can be listed and deleted consistently even if stored with slightly different tenant_id formatting
The system automatically matches tenant IDs after normalization, ensuring operations work across different input formats

HTTP Endpoint Support

Added GET support for /rag/list endpoint (previously POST-only)
Added DELETE support for /rag/delete/{document_id} and /rag/delete-all endpoints
All endpoints support both MCP protocol (POST with JSON payload) and direct HTTP methods (GET/DELETE with query parameters)

Response Format

MCP server responses are wrapped in a standard format with status, data, and metadata fields
RAG client automatically unwraps responses for seamless integration
Error responses include detailed messages for better debugging

RAG Search Enhancements

Cross-Encoder Re-ranking: Two-stage retrieval process for massive accuracy improvement:
- Initial vector search retrieves top candidates using embeddings
- Cross-encoder model (cross-encoder/ms-marco-MiniLM-L-6-v2) re-ranks top 10 results
- Final filtering by threshold and limit applied
- Seamlessly integrated with existing search API
Lowered default threshold from 0.5 to 0.3 for improved recall of relevant documents
Intelligent fallback mechanism returns the top result even if similarity score is below threshold, ensuring knowledge base content is always accessible
Configurable threshold via threshold parameter in search requests (default: 0.3)
Enhanced tool selection automatically triggers RAG for admin questions, fact lookups ("who is", "what is"), and internal knowledge queries
Response unwrapping in MCP client ensures orchestrator receives properly formatted results for tool scoring and prompt building

Conversation Memory System

Short-Term Memory: Automatic storage of tool outputs per session with configurable size limits (default: 10 outputs) and TTL (default: 900 seconds / 15 minutes)
Session-Based Isolation: Memory is keyed by session_id (not tenant_id) for safety, ensuring no cross-tenant data mixing
Automatic Injection: Recent memory is automatically injected into tool payloads as a memory field, enabling tools to make context-aware decisions in multi-step workflows
Auto-Expiration: Memory entries automatically expire after TTL or can be explicitly cleared via end_session/endSession flag
Configuration: Tune behavior via environment variables:
- MCP_MEMORY_MAX_ITEMS: Maximum number of tool outputs to keep per session (default: 10)
- MCP_MEMORY_TTL_SECONDS: Time-to-live for memory entries in seconds (default: 900)
Comprehensive Testing: Full test suite in backend/tests/test_conversation_memory.py covering storage, retrieval, expiration, and multi-step workflows

AI-Generated KB Metadata

When ingesting documents, the system automatically extracts rich metadata:

Title Extraction: From filename, URL, or content structure (with intelligent fallback)
Summary Generation: 2-3 sentence summary via LLM (with keyword-based fallback)
Tag Extraction: 5-8 relevant tags extracted from content
Topic Identification: 3-5 main themes identified via LLM
Date Detection: Multiple date formats automatically detected
Quality Score: 0.0-1.0 score based on structure and completeness

Intelligent Fallback: When LLM is unavailable or times out, uses keyword extraction and pattern matching to provide useful metadata.

Database Integration: Metadata stored in JSONB column (metadata) for flexible querying and enhanced RAG search. Migration script: backend/scripts/migrate_add_metadata.py.

API Response: Ingestion endpoints (/rag/ingest-document, /rag/ingest-file) now return extracted_metadata in the response.

Per-Tool Latency Prediction & Context-Aware Routing

The agent now uses sophisticated routing logic to optimize tool selection:

Latency Prediction: Agent estimates expected latency before tool selection:
- RAG: 60-120ms (depends on result count)
- Web: 400-1800ms (network-dependent)
- Admin: <20ms (local regex matching)
- LLM: Variable based on model and token count
Path Optimization: Agent chooses fastest tool sequence based on latency estimates
Context-Aware Routing: Intelligent tool skipping based on previous outputs:
- High RAG score (≥0.8) → Skip web search
- Critical admin violation → Skip agent reasoning, immediate block
- Relevant memory available → Skip RAG, use memory instead
Routing Hints: Context hints included in reasoning trace for transparency

Implementation: backend/api/services/tool_metadata.py defines latency estimates and routing logic. backend/api/services/tool_selector.py implements context-aware decisions.

Tool Output Schemas

Every tool now returns strict JSON schemas for consistency:

RAG: {results: [...], top_score: float, latency_ms: int}
Web: {results: [...], latency_ms: int}
Admin: {violations: [...], severity: str, latency_ms: int}
LLM: {text: str, tokens_used: int, latency_ms: int}

Automatic Validation: All tool outputs validated and formatted in AgentOrchestrator before use. Makes debugging and monitoring simpler.

Schema Definitions: backend/api/services/tool_metadata.py contains TOOL_OUTPUT_SCHEMAS with validation functions.

Role Propagation & Permission Handling

Fixed Role Propagation: User role is now properly passed from API route → process_ingestion() → RAG client → MCP server
- rag_client.ingest_with_metadata() now accepts user_role parameter
- Role is included in payload sent to MCP server: {"user_role": "owner", ...}
- MCP server extracts role from payload via build_tenant_context() and uses it for permission checks
Improved Error Handling:
- Permission errors (403) return clear messages with actionable guidance
- Error messages specify which roles are allowed for each action
- Frontend displays user-friendly error messages with instructions
Debug Logging: Added debug logging in route handlers and services to trace role values:
- Logs role received in route handler
- Logs role passed to process_ingestion
- Logs role sent to RAG client
- Logs role received by MCP server
Admin Question Handling: Fixed admin identity questions to prioritize RAG results from knowledge base

UI Enhancements (app.py)

Knowledge Base Library Tab:
- Statistics cards showing document counts by type
- Interactive Plotly pie chart for document type distribution
- Semantic search with relevance scoring
- Type filtering (text, PDF, FAQ, link)
- Document management with preview and deletion
- Auto-refresh after operations
Admin Analytics Tab:
- Statistics cards for key metrics (queries, users, red flags, RAG searches)
- Interactive Plotly bar charts for tool usage, latency, and RAG quality
- Detailed tool usage table with performance metrics
- Formatted summary with dark theme styling
- Real-time data fetching and visualization
- Access: All roles can view analytics (viewer, editor, admin, owner)
Debug & Reasoning Tab:
- Reasoning trace analyzer showing step-by-step agent decision-making
- Tool invocation timeline with latency visualization
- AI metadata display after document ingestion (title, summary, tags, topics, quality score)
- Latency predictions shown in reasoning trace (estimated vs actual)
- Context-aware routing hints visualized (skip web/RAG/reasoning decisions)
- Tool output schemas displayed in debug view
- Formatted markdown output with detailed metrics
- Uses /agent/debug endpoint for comprehensive insights
Modern UI/UX:
- Dark theme with white text for better readability
- Custom CSS styling for cards and charts
- Improved error handling and status messages
- Responsive layout with proper component scaling

LLM-Guided Rule Explanation

The rule enhancement system includes intelligent fallback mechanisms:

LLM Enhancement: When available, rules are enhanced with comprehensive explanations, examples, missing patterns, edge cases, and improvements
Intelligent Fallback: When LLM times out or fails, the system automatically generates basic explanations using keyword extraction:
- Detects keywords (password, API key, credit card, sensitive data, etc.)
- Generates contextual explanations based on detected keywords
- Provides relevant examples (5-8 examples) based on rule type
- Suggests missing patterns (3-5 suggestions) for rule improvement
Timeout Protection: 30-second timeout per rule with graceful fallback
Chunk Processing: Bulk rule processing handles failures gracefully - one rule failure doesn't block others

This ensures users always receive useful rule explanations even when the LLM service is unavailable or slow.

Real-Time Visualization Components (Next.js Frontend)

The Next.js frontend includes three powerful visualization components:

Reasoning Path Visualizer: Step-by-step visualization of agent reasoning with animated progression, status indicators, and detailed metrics. Integrated into chat panel.
Tool Invocation Timeline: Visual timeline showing tool execution order, latency, and result counts. Integrated into chat panel.
Tenant Activity Heatmap: Query activity heatmap and per-tool usage trends. Integrated into analytics page.

All visualizations are accessible to all roles and automatically populate when agent responses include reasoning_trace and tool_traces data.

Environment Variables (excerpt)

Defined in env.example:

RAG_MCP_URL - Default: http://localhost:8900/rag (unified MCP server)
WEB_MCP_URL - Default: http://localhost:8900/web (unified MCP server for Google web search)
ADMIN_MCP_URL - Default: http://localhost:8900/admin (unified MCP server)
MCP_PORT - Port for unified MCP server (default: 8900)
MCP_HOST - Host for unified MCP server (default: 0.0.0.0)
POSTGRESQL_URL - PostgreSQL connection string with pgvector extension
OLLAMA_URL, OLLAMA_MODEL (or GROQ_API_KEY + LLM_BACKEND=groq)
SUPABASE_URL, SUPABASE_SERVICE_KEY - Required for Supabase backend (admin rules + analytics)
- If not set, the system automatically falls back to SQLite in data/ directory
- See SUPABASE_SETUP.md in the root directory for detailed setup instructions
GOOGLE_SEARCH_API_KEY, GOOGLE_SEARCH_CX_ID - Credentials for Google Programmable Search used by web.search
MCP_MEMORY_MAX_ITEMS - Maximum number of tool outputs to keep per session (default: 10)
MCP_MEMORY_TTL_SECONDS - Time-to-live for memory entries in seconds (default: 900)
APP_ENV, LOG_LEVEL, API_PORT

Update these before starting the servers to ensure the agent can reach every MCP endpoint and LLM runtime.

Note: The unified MCP server runs on a single port (default 8900) and handles all namespaced tools. The start.bat script automatically configures the correct URLs.

Supabase Configuration

Both RulesStore and AnalyticsStore support dual-backend storage with automatic detection:

Setup Steps

Create Supabase tables:
- Run supabase_admin_rules_table.sql in Supabase SQL Editor (from repo root)
- Run supabase_analytics_tables.sql in Supabase SQL Editor (from repo root)

Configure environment variables in .env:

SUPABASE_URL=https://your-project-id.supabase.co
SUPABASE_SERVICE_KEY=your_service_role_key_here

Verify configuration:
```
python verify_supabase_setup.py
```
Migrate existing data (if you have SQLite data):
```
python migrate_sqlite_to_supabase.py
```

How It Works

Automatic Detection: Both stores check for SUPABASE_URL and SUPABASE_SERVICE_KEY at initialization
Supabase First: If credentials are found, Supabase is used automatically
SQLite Fallback: If Supabase is not configured, SQLite databases in data/ are used
Startup Logging: Check startup logs to see which backend each store is using:
- ✅ RulesStore: Using Supabase backend
- ✅ AnalyticsStore: Using Supabase backend
- Or ⚠️ RulesStore: Using SQLite backend if Supabase is not configured

Tables Used

Admin Rules: admin_rules table in Supabase
Analytics: tool_usage_events, redflag_violations, rag_search_events, agent_query_events

See SUPABASE_SETUP.md and SUPABASE_MIGRATION_COMPLETE.md in the root directory for detailed instructions and troubleshooting.

Unified MCP tool instructions

Agents that speak the Model Context Protocol should connect to the integrachat server id defined in backend/mcp_server/server.py and call the namespaced tools directly:

Namespace	Tool	Purpose	HTTP Endpoint
`rag`	`search`	Retrieve tenant-scoped document chunks	`POST /rag/search`
`rag`	`ingest`	Chunk + store new knowledge	`POST /rag/ingest`
`rag`	`list`	List all documents for tenant	`GET /rag/list?tenant_id={id}`
`rag`	`delete`	Remove one/all stored documents	`DELETE /rag/delete/{id}?tenant_id={id}` or `DELETE /rag/delete-all?tenant_id={id}`
`web`	`search`	Google Programmable Search (Custom Search API)	`POST /web/search`
`admin`	`getRules`	Fetch tenant governance rules (list or detailed)	`POST /admin/getRules`
`admin`	`addRule`	Insert or update a rule	`POST /admin/addRule`
`admin`	`deleteRule`	Remove a rule by text	`POST /admin/deleteRule`
`admin`	`logViolation`	Persist a red-flag event into analytics	`POST /admin/logViolation`

Important Notes:

Always send tenant_id in the payload (or as query parameter for GET/DELETE requests) so the shared middleware can enforce isolation and log analytics
The MCP server automatically normalizes tenant IDs to ensure consistent matching across operations
All endpoints support both POST (with JSON payload) and direct HTTP methods (GET for list, DELETE for delete operations)
Tenant ID normalization handles whitespace and ensures documents can be listed and deleted consistently
RAG search uses a default threshold of 0.3 for better recall; adjust via threshold parameter if needed
Conversation Memory: Send session_id (or sessionId/conversation_id/conversationId) in tool payloads to enable short-term memory. Recent tool outputs are automatically stored and injected into subsequent tool calls as a memory field. Send end_session: true to clear memory for a session.

Troubleshooting

RAG Search Not Returning Results

Check similarity threshold: The default threshold is 0.3. If results are still not found, try lowering it to 0.2 or 0.1
Verify documents are ingested: Use GET /rag/list?tenant_id={id} to confirm documents exist for the tenant
Check tenant ID matching: Ensure the tenant_id used for search matches the one used for ingestion (normalization handles whitespace automatically)
Review search logs: Check MCP server logs for search metrics (hits_count, avg_score, top_score)

Agent Not Using RAG for Knowledge Base Questions

Verify RAG results are being found: Check the agent debug endpoint (POST /agent/debug) to see if RAG results are being pre-fetched
Check tool scores: The debug output shows rag_fitness score; if it's low (< 0.4), the agent may skip RAG
Ensure knowledge base content exists: Questions like "who is the admin" require relevant content in the knowledge base
Pattern matching: The tool selector automatically triggers RAG for patterns like "admin", "who is", "what is", but semantic similarity also plays a role

Document Ingestion Permission Errors

403 Forbidden: If you see "Role 'viewer' is not permitted to perform 'ingest_documents'":
- Change your role in the UI dropdown (top right) from "viewer" to "editor", "admin", or "owner"
- Verify x-user-role header is being sent correctly (check backend logs for 🔍 DEBUG: messages)
- Check that role is being propagated through the pipeline: route handler → process_ingestion → RAG client → MCP server
- Review debug logs to see where role might be getting lost or defaulting to "viewer"
Role Propagation: The role must flow through:
1. UI sends x-user-role header
2. Route handler receives it as x_user_role parameter
3. Route handler passes it to process_ingestion(user_role=...)
4. process_ingestion passes it to rag_client.ingest_with_metadata(user_role=...)
5. RAG client includes it in payload: {"user_role": "...", ...}
6. MCP server extracts it via build_tenant_context() and uses for permission checks

Document Deletion Issues

404 Not Found: Verify the document_id exists and belongs to the correct tenant
Tenant ID mismatch: The system normalizes tenant IDs, but ensure you're using the same tenant_id format as when documents were ingested
Check logs: Database deletion logs show detailed information about tenant ID matching and document existence

Supabase Configuration Issues

Data still going to SQLite: Check that SUPABASE_URL and SUPABASE_SERVICE_KEY are set correctly in .env (no quotes, no spaces)
Service role key errors: Make sure you're using the service_role key (not anon key) from Supabase Dashboard → Settings → API
Tables don't exist: Run supabase_admin_rules_table.sql and supabase_analytics_tables.sql in Supabase SQL Editor
Permission errors: Check RLS policies in Supabase allow service role access
Startup warnings: Check FastAPI startup logs to see which backend each store is using (✅ for Supabase, ⚠️ for SQLite fallback)