Spaces:
Sleeping
Backend Documentation
This folder contains the production-ready FastAPI stack plus the companion MCP servers that power IntegraChat.
Directory Overview
api/β FastAPI application (routes, services, storage helpers, MCP clients)mcp_server/β Unified MCP server exposing rag/web/admin tools via namespacesworkers/β Celery workers and schedulers for async ingestion + analytics maintenance
Prerequisites
- Python 3.10+
- PostgreSQL (with the
vectorextension) for RAG data, or Supabase with pgvector enabled - Supabase (recommended) for admin rules + analytics storage, with automatic SQLite fallback in
data/- Both
RulesStoreandAnalyticsStoreautomatically detect and use Supabase when configured - Falls back to SQLite automatically if Supabase credentials are missing
- See
SUPABASE_SETUP.mdin the root directory for setup instructions
- Both
- Optional: Ollama running locally (default) or Groq API credentials for remote LLMs
Create a virtual environment at the repo root, then:
pip install -r requirements.txt
cp env.example .env # update MCP URLs + LLM settings
Running the Services Locally
FastAPI core
uvicorn backend.api.main:app --port 8000 --reloadUnified MCP server (rag/web/admin)
python backend/mcp_server/server.pyOr use the provided startup script:
start.bat # Windows - launches MCP server on port 8900 and FastAPI on port 8000This single server (default port 8900) exposes the following namespaced tools:
rag.search- Semantic search across tenant documentsrag.ingest- Ingest text content into knowledge baserag.delete- Delete individual or all documents for a tenantrag.list- List all documents for a tenant with paginationweb.search- Google Programmable Search (Custom Search API) web searchadmin.getRules,admin.addRule,admin.deleteRule,admin.logViolation
HTTP Endpoints (for direct API access):
GET /rag/list?tenant_id={id}&limit={n}&offset={n}- List documentsPOST /rag/ingest- Ingest contentPOST /rag/search- Search documents (supportsthresholdparameter, default: 0.3)DELETE /rag/delete/{document_id}?tenant_id={id}- Delete specific documentDELETE /rag/delete-all?tenant_id={id}- Delete all documentsPOST /web/search- Web searchPOST /admin/*- Admin operations
Optional workers (if running Celery-based ingestion/analytics jobs):
celery -A backend.workers.ingestion_worker worker --loglevel=info celery -A backend.workers.analytics_worker worker --loglevel=info
The Gradio UI (python app.py) and the Next.js operator console (see frontend/README.md) both talk to the FastAPI layer at http://localhost:8000.
Key Endpoints
All endpoints require the x-tenant-id header unless otherwise noted.
| Service | Path | Notes |
|---|---|---|
| Agent | POST /agent/message |
Autonomous orchestration (RAG/Web/Admin/LLM) |
| Agent Debug | POST /agent/debug |
Full reasoning trace + tool plan |
| Agent Plan | POST /agent/plan |
Dry-run planning without executing tools |
| RAG | POST /rag/ingest-document |
Rich ingestion (text, URL, metadata) |
| RAG | POST /rag/ingest-file |
File upload (PDF/DOCX/TXT/MD) |
| RAG | GET /rag/list |
Paginated document listing per tenant (requires x-tenant-id header) |
| RAG | DELETE /rag/delete/{document_id} |
Delete specific document (requires x-tenant-id header) |
| RAG | DELETE /rag/delete-all |
Delete all documents for tenant (requires x-tenant-id header) |
| Admin | POST /admin/rules |
Regex + severity rule ingestion |
| Analytics | GET /analytics/overview |
Summary metrics (queries, tokens, red flags) |
Refer to the root README.md for the complete endpoint tables.
Diagnostics & Tenant Isolation
Use the helper scripts in the repo root when validating backend changes:
python verify_tenant_isolation.pyβ Exercises analytics logging, admin rule CRUD, API reachability, and proves RAG tenant isolation by ingesting + querying as multiple tenants.python check_rag_database.pyβ Talks directly to the pgvector database to list tenant IDs, preview stored chunks, and run safeguarded searches viasearch_vectors(). Helpful when troubleshooting suspected cross-tenant leakage.python verify_supabase_setup.pyβ Verifies Supabase configuration and shows which backend (Supabase or SQLite) each store is using.python check_supabase_rules.pyβ Checks Supabase admin rules configuration and RLS policies.python migrate_sqlite_to_supabase.pyβ One-shot migration script to copy existing SQLite data to Supabase.python test_manual.pyβ Legacy manual smoke test harness (analytics store, admin rules, API surface).
Troubleshooting tip: If the isolation script reports a failure, first run
check_rag_database.pyto confirm documents are tagged with the correcttenant_id, then restart the unified MCP server so it reloads the updated SQL filtering logic.
Recent Improvements
Tenant ID Normalization
- All database operations now normalize tenant IDs to handle whitespace and formatting differences
- Documents can be listed and deleted consistently even if stored with slightly different tenant_id formatting
- The system automatically matches tenant IDs after normalization, ensuring operations work across different input formats
HTTP Endpoint Support
- Added GET support for
/rag/listendpoint (previously POST-only) - Added DELETE support for
/rag/delete/{document_id}and/rag/delete-allendpoints - All endpoints support both MCP protocol (POST with JSON payload) and direct HTTP methods (GET/DELETE with query parameters)
Response Format
- MCP server responses are wrapped in a standard format with
status,data, andmetadatafields - RAG client automatically unwraps responses for seamless integration
- Error responses include detailed messages for better debugging
RAG Search Enhancements
- Cross-Encoder Re-ranking: Two-stage retrieval process for massive accuracy improvement:
- Initial vector search retrieves top candidates using embeddings
- Cross-encoder model (
cross-encoder/ms-marco-MiniLM-L-6-v2) re-ranks top 10 results - Final filtering by threshold and limit applied
- Seamlessly integrated with existing search API
- Lowered default threshold from 0.5 to 0.3 for improved recall of relevant documents
- Intelligent fallback mechanism returns the top result even if similarity score is below threshold, ensuring knowledge base content is always accessible
- Configurable threshold via
thresholdparameter in search requests (default: 0.3) - Enhanced tool selection automatically triggers RAG for admin questions, fact lookups ("who is", "what is"), and internal knowledge queries
- Response unwrapping in MCP client ensures orchestrator receives properly formatted results for tool scoring and prompt building
Conversation Memory System
- Short-Term Memory: Automatic storage of tool outputs per session with configurable size limits (default: 10 outputs) and TTL (default: 900 seconds / 15 minutes)
- Session-Based Isolation: Memory is keyed by
session_id(nottenant_id) for safety, ensuring no cross-tenant data mixing - Automatic Injection: Recent memory is automatically injected into tool payloads as a
memoryfield, enabling tools to make context-aware decisions in multi-step workflows - Auto-Expiration: Memory entries automatically expire after TTL or can be explicitly cleared via
end_session/endSessionflag - Configuration: Tune behavior via environment variables:
MCP_MEMORY_MAX_ITEMS: Maximum number of tool outputs to keep per session (default: 10)MCP_MEMORY_TTL_SECONDS: Time-to-live for memory entries in seconds (default: 900)
- Comprehensive Testing: Full test suite in
backend/tests/test_conversation_memory.pycovering storage, retrieval, expiration, and multi-step workflows
AI-Generated KB Metadata
When ingesting documents, the system automatically extracts rich metadata:
- Title Extraction: From filename, URL, or content structure (with intelligent fallback)
- Summary Generation: 2-3 sentence summary via LLM (with keyword-based fallback)
- Tag Extraction: 5-8 relevant tags extracted from content
- Topic Identification: 3-5 main themes identified via LLM
- Date Detection: Multiple date formats automatically detected
- Quality Score: 0.0-1.0 score based on structure and completeness
Intelligent Fallback: When LLM is unavailable or times out, uses keyword extraction and pattern matching to provide useful metadata.
Database Integration: Metadata stored in JSONB column (metadata) for flexible querying and enhanced RAG search. Migration script: backend/scripts/migrate_add_metadata.py.
API Response: Ingestion endpoints (/rag/ingest-document, /rag/ingest-file) now return extracted_metadata in the response.
Per-Tool Latency Prediction & Context-Aware Routing
The agent now uses sophisticated routing logic to optimize tool selection:
- Latency Prediction: Agent estimates expected latency before tool selection:
- RAG: 60-120ms (depends on result count)
- Web: 400-1800ms (network-dependent)
- Admin: <20ms (local regex matching)
- LLM: Variable based on model and token count
- Path Optimization: Agent chooses fastest tool sequence based on latency estimates
- Context-Aware Routing: Intelligent tool skipping based on previous outputs:
- High RAG score (β₯0.8) β Skip web search
- Critical admin violation β Skip agent reasoning, immediate block
- Relevant memory available β Skip RAG, use memory instead
- Routing Hints: Context hints included in reasoning trace for transparency
Implementation: backend/api/services/tool_metadata.py defines latency estimates and routing logic. backend/api/services/tool_selector.py implements context-aware decisions.
Tool Output Schemas
Every tool now returns strict JSON schemas for consistency:
- RAG:
{results: [...], top_score: float, latency_ms: int} - Web:
{results: [...], latency_ms: int} - Admin:
{violations: [...], severity: str, latency_ms: int} - LLM:
{text: str, tokens_used: int, latency_ms: int}
Automatic Validation: All tool outputs validated and formatted in AgentOrchestrator before use. Makes debugging and monitoring simpler.
Schema Definitions: backend/api/services/tool_metadata.py contains TOOL_OUTPUT_SCHEMAS with validation functions.
Role Propagation & Permission Handling
- Fixed Role Propagation: User role is now properly passed from API route β
process_ingestion()β RAG client β MCP serverrag_client.ingest_with_metadata()now acceptsuser_roleparameter- Role is included in payload sent to MCP server:
{"user_role": "owner", ...} - MCP server extracts role from payload via
build_tenant_context()and uses it for permission checks
- Improved Error Handling:
- Permission errors (403) return clear messages with actionable guidance
- Error messages specify which roles are allowed for each action
- Frontend displays user-friendly error messages with instructions
- Debug Logging: Added debug logging in route handlers and services to trace role values:
- Logs role received in route handler
- Logs role passed to process_ingestion
- Logs role sent to RAG client
- Logs role received by MCP server
- Admin Question Handling: Fixed admin identity questions to prioritize RAG results from knowledge base
UI Enhancements (app.py)
Knowledge Base Library Tab:
- Statistics cards showing document counts by type
- Interactive Plotly pie chart for document type distribution
- Semantic search with relevance scoring
- Type filtering (text, PDF, FAQ, link)
- Document management with preview and deletion
- Auto-refresh after operations
Admin Analytics Tab:
- Statistics cards for key metrics (queries, users, red flags, RAG searches)
- Interactive Plotly bar charts for tool usage, latency, and RAG quality
- Detailed tool usage table with performance metrics
- Formatted summary with dark theme styling
- Real-time data fetching and visualization
- Access: All roles can view analytics (viewer, editor, admin, owner)
Debug & Reasoning Tab:
- Reasoning trace analyzer showing step-by-step agent decision-making
- Tool invocation timeline with latency visualization
- AI metadata display after document ingestion (title, summary, tags, topics, quality score)
- Latency predictions shown in reasoning trace (estimated vs actual)
- Context-aware routing hints visualized (skip web/RAG/reasoning decisions)
- Tool output schemas displayed in debug view
- Formatted markdown output with detailed metrics
- Uses
/agent/debugendpoint for comprehensive insights
Modern UI/UX:
- Dark theme with white text for better readability
- Custom CSS styling for cards and charts
- Improved error handling and status messages
- Responsive layout with proper component scaling
LLM-Guided Rule Explanation
The rule enhancement system includes intelligent fallback mechanisms:
- LLM Enhancement: When available, rules are enhanced with comprehensive explanations, examples, missing patterns, edge cases, and improvements
- Intelligent Fallback: When LLM times out or fails, the system automatically generates basic explanations using keyword extraction:
- Detects keywords (password, API key, credit card, sensitive data, etc.)
- Generates contextual explanations based on detected keywords
- Provides relevant examples (5-8 examples) based on rule type
- Suggests missing patterns (3-5 suggestions) for rule improvement
- Timeout Protection: 30-second timeout per rule with graceful fallback
- Chunk Processing: Bulk rule processing handles failures gracefully - one rule failure doesn't block others
This ensures users always receive useful rule explanations even when the LLM service is unavailable or slow.
Real-Time Visualization Components (Next.js Frontend)
The Next.js frontend includes three powerful visualization components:
- Reasoning Path Visualizer: Step-by-step visualization of agent reasoning with animated progression, status indicators, and detailed metrics. Integrated into chat panel.
- Tool Invocation Timeline: Visual timeline showing tool execution order, latency, and result counts. Integrated into chat panel.
- Tenant Activity Heatmap: Query activity heatmap and per-tool usage trends. Integrated into analytics page.
All visualizations are accessible to all roles and automatically populate when agent responses include reasoning_trace and tool_traces data.
Environment Variables (excerpt)
Defined in env.example:
RAG_MCP_URL- Default:http://localhost:8900/rag(unified MCP server)WEB_MCP_URL- Default:http://localhost:8900/web(unified MCP server for Google web search)ADMIN_MCP_URL- Default:http://localhost:8900/admin(unified MCP server)MCP_PORT- Port for unified MCP server (default: 8900)MCP_HOST- Host for unified MCP server (default: 0.0.0.0)POSTGRESQL_URL- PostgreSQL connection string with pgvector extensionOLLAMA_URL,OLLAMA_MODEL(orGROQ_API_KEY+LLM_BACKEND=groq)SUPABASE_URL,SUPABASE_SERVICE_KEY- Required for Supabase backend (admin rules + analytics)- If not set, the system automatically falls back to SQLite in
data/directory - See
SUPABASE_SETUP.mdin the root directory for detailed setup instructions
- If not set, the system automatically falls back to SQLite in
GOOGLE_SEARCH_API_KEY,GOOGLE_SEARCH_CX_ID- Credentials for Google Programmable Search used byweb.searchMCP_MEMORY_MAX_ITEMS- Maximum number of tool outputs to keep per session (default: 10)MCP_MEMORY_TTL_SECONDS- Time-to-live for memory entries in seconds (default: 900)APP_ENV,LOG_LEVEL,API_PORT
Update these before starting the servers to ensure the agent can reach every MCP endpoint and LLM runtime.
Note: The unified MCP server runs on a single port (default 8900) and handles all namespaced tools. The start.bat script automatically configures the correct URLs.
Supabase Configuration
Both RulesStore and AnalyticsStore support dual-backend storage with automatic detection:
Setup Steps
Create Supabase tables:
- Run
supabase_admin_rules_table.sqlin Supabase SQL Editor (from repo root) - Run
supabase_analytics_tables.sqlin Supabase SQL Editor (from repo root)
- Run
Configure environment variables in
.env:SUPABASE_URL=https://your-project-id.supabase.co SUPABASE_SERVICE_KEY=your_service_role_key_hereVerify configuration:
python verify_supabase_setup.pyMigrate existing data (if you have SQLite data):
python migrate_sqlite_to_supabase.py
How It Works
- Automatic Detection: Both stores check for
SUPABASE_URLandSUPABASE_SERVICE_KEYat initialization - Supabase First: If credentials are found, Supabase is used automatically
- SQLite Fallback: If Supabase is not configured, SQLite databases in
data/are used - Startup Logging: Check startup logs to see which backend each store is using:
β RulesStore: Using Supabase backendβ AnalyticsStore: Using Supabase backend- Or
β οΈ RulesStore: Using SQLite backendif Supabase is not configured
Tables Used
- Admin Rules:
admin_rulestable in Supabase - Analytics:
tool_usage_events,redflag_violations,rag_search_events,agent_query_events
See SUPABASE_SETUP.md and SUPABASE_MIGRATION_COMPLETE.md in the root directory for detailed instructions and troubleshooting.
Unified MCP tool instructions
Agents that speak the Model Context Protocol should connect to the integrachat server id defined in backend/mcp_server/server.py and call the namespaced tools directly:
| Namespace | Tool | Purpose | HTTP Endpoint |
|---|---|---|---|
rag |
search |
Retrieve tenant-scoped document chunks | POST /rag/search |
rag |
ingest |
Chunk + store new knowledge | POST /rag/ingest |
rag |
list |
List all documents for tenant | GET /rag/list?tenant_id={id} |
rag |
delete |
Remove one/all stored documents | DELETE /rag/delete/{id}?tenant_id={id} or DELETE /rag/delete-all?tenant_id={id} |
web |
search |
Google Programmable Search (Custom Search API) | POST /web/search |
admin |
getRules |
Fetch tenant governance rules (list or detailed) | POST /admin/getRules |
admin |
addRule |
Insert or update a rule | POST /admin/addRule |
admin |
deleteRule |
Remove a rule by text | POST /admin/deleteRule |
admin |
logViolation |
Persist a red-flag event into analytics | POST /admin/logViolation |
Important Notes:
- Always send
tenant_idin the payload (or as query parameter for GET/DELETE requests) so the shared middleware can enforce isolation and log analytics - The MCP server automatically normalizes tenant IDs to ensure consistent matching across operations
- All endpoints support both POST (with JSON payload) and direct HTTP methods (GET for list, DELETE for delete operations)
- Tenant ID normalization handles whitespace and ensures documents can be listed and deleted consistently
- RAG search uses a default threshold of 0.3 for better recall; adjust via
thresholdparameter if needed - Conversation Memory: Send
session_id(orsessionId/conversation_id/conversationId) in tool payloads to enable short-term memory. Recent tool outputs are automatically stored and injected into subsequent tool calls as amemoryfield. Sendend_session: trueto clear memory for a session.
Troubleshooting
RAG Search Not Returning Results
- Check similarity threshold: The default threshold is 0.3. If results are still not found, try lowering it to 0.2 or 0.1
- Verify documents are ingested: Use
GET /rag/list?tenant_id={id}to confirm documents exist for the tenant - Check tenant ID matching: Ensure the tenant_id used for search matches the one used for ingestion (normalization handles whitespace automatically)
- Review search logs: Check MCP server logs for search metrics (hits_count, avg_score, top_score)
Agent Not Using RAG for Knowledge Base Questions
- Verify RAG results are being found: Check the agent debug endpoint (
POST /agent/debug) to see if RAG results are being pre-fetched - Check tool scores: The debug output shows
rag_fitnessscore; if it's low (< 0.4), the agent may skip RAG - Ensure knowledge base content exists: Questions like "who is the admin" require relevant content in the knowledge base
- Pattern matching: The tool selector automatically triggers RAG for patterns like "admin", "who is", "what is", but semantic similarity also plays a role
Document Ingestion Permission Errors
- 403 Forbidden: If you see "Role 'viewer' is not permitted to perform 'ingest_documents'":
- Change your role in the UI dropdown (top right) from "viewer" to "editor", "admin", or "owner"
- Verify
x-user-roleheader is being sent correctly (check backend logs forπ DEBUG:messages) - Check that role is being propagated through the pipeline: route handler β process_ingestion β RAG client β MCP server
- Review debug logs to see where role might be getting lost or defaulting to "viewer"
- Role Propagation: The role must flow through:
- UI sends
x-user-roleheader - Route handler receives it as
x_user_roleparameter - Route handler passes it to
process_ingestion(user_role=...) process_ingestionpasses it torag_client.ingest_with_metadata(user_role=...)- RAG client includes it in payload:
{"user_role": "...", ...} - MCP server extracts it via
build_tenant_context()and uses for permission checks
- UI sends
Document Deletion Issues
- 404 Not Found: Verify the document_id exists and belongs to the correct tenant
- Tenant ID mismatch: The system normalizes tenant IDs, but ensure you're using the same tenant_id format as when documents were ingested
- Check logs: Database deletion logs show detailed information about tenant ID matching and document existence
Supabase Configuration Issues
- Data still going to SQLite: Check that
SUPABASE_URLandSUPABASE_SERVICE_KEYare set correctly in.env(no quotes, no spaces) - Service role key errors: Make sure you're using the service_role key (not anon key) from Supabase Dashboard β Settings β API
- Tables don't exist: Run
supabase_admin_rules_table.sqlandsupabase_analytics_tables.sqlin Supabase SQL Editor - Permission errors: Check RLS policies in Supabase allow service role access
- Startup warnings: Check FastAPI startup logs to see which backend each store is using (
βfor Supabase,β οΈfor SQLite fallback)