Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
Backend Documentation
This folder contains the production-ready FastAPI stack plus the companion MCP servers that power IntegraChat.
Directory Overview
api/β FastAPI application (routes, services, storage helpers, MCP clients)mcp_server/β Unified MCP server exposing rag/web/admin tools via namespacesworkers/β Celery workers and schedulers for async ingestion + analytics maintenance
Prerequisites
- Python 3.10+
- PostgreSQL (with the
vectorextension) for RAG data, or Supabase with pgvector enabled - Supabase (recommended) for admin rules + analytics storage, with automatic SQLite fallback in
data/- Both
RulesStoreandAnalyticsStoreautomatically detect and use Supabase when configured - Falls back to SQLite automatically if Supabase credentials are missing
- See
SUPABASE_SETUP.mdin the root directory for setup instructions
- Both
- Optional: Ollama running locally (default) or Groq API credentials for remote LLMs
Create a virtual environment at the repo root, then:
pip install -r requirements.txt
cp env.example .env # update MCP URLs + LLM settings
Running the Services Locally
FastAPI core
uvicorn backend.api.main:app --port 8000 --reloadUnified MCP server (rag/web/admin)
python backend/mcp_server/server.pyOr use the provided startup script:
start.bat # Windows - launches MCP server on port 8900 and FastAPI on port 8000This single server (default port 8900) exposes the following namespaced tools:
rag.search- Semantic search across tenant documentsrag.ingest- Ingest text content into knowledge baserag.delete- Delete individual or all documents for a tenantrag.list- List all documents for a tenant with paginationweb.search- Google Programmable Search (Custom Search API) web searchadmin.getRules,admin.addRule,admin.deleteRule,admin.logViolation
HTTP Endpoints (for direct API access):
GET /rag/list?tenant_id={id}&limit={n}&offset={n}- List documentsPOST /rag/ingest- Ingest contentPOST /rag/search- Search documents (supportsthresholdparameter, default: 0.3)DELETE /rag/delete/{document_id}?tenant_id={id}- Delete specific documentDELETE /rag/delete-all?tenant_id={id}- Delete all documentsPOST /web/search- Web searchPOST /admin/*- Admin operations
Optional workers (if running Celery-based ingestion/analytics jobs):
celery -A backend.workers.ingestion_worker worker --loglevel=info celery -A backend.workers.analytics_worker worker --loglevel=info
The Gradio UI (python app.py) talks to the FastAPI layer at http://localhost:8000.
Key Endpoints
All endpoints require the x-tenant-id header unless otherwise noted.
| Service | Path | Notes |
|---|---|---|
| Agent | POST /agent/message |
Autonomous orchestration (RAG/Web/Admin/LLM) |
| Agent Debug | POST /agent/debug |
Full reasoning trace + tool plan |
| Agent Plan | POST /agent/plan |
Dry-run planning without executing tools |
| RAG | POST /rag/ingest-document |
Rich ingestion (text, URL, metadata) |
| RAG | POST /rag/ingest-file |
File upload (PDF/DOCX/TXT/MD) |
| RAG | GET /rag/list |
Paginated document listing per tenant (requires x-tenant-id header) |
| RAG | DELETE /rag/delete/{document_id} |
Delete specific document (requires x-tenant-id header) |
| RAG | DELETE /rag/delete-all |
Delete all documents for tenant (requires x-tenant-id header) |
| Admin | POST /admin/rules |
Regex + severity rule ingestion |
| Analytics | GET /analytics/overview |
Summary metrics (queries, tokens, red flags) |
Refer to the root README.md for the complete endpoint tables.
Diagnostics & Tenant Isolation
When validating backend changes:
- API Testing: Use the FastAPI interactive docs at
http://localhost:8000/docsto test endpoints - Database Inspection: Connect directly to your PostgreSQL/Supabase instance to verify tenant isolation and check that documents are tagged with the correct
tenant_id - Log Monitoring: Check FastAPI and MCP server logs for detailed error messages and debugging information
Troubleshooting tip: If you suspect tenant isolation issues, check database queries to confirm they include
WHERE tenant_id = ...filters, then restart the unified MCP server so it reloads the updated SQL filtering logic.
Recent Improvements
Tenant ID Normalization
- All database operations now normalize tenant IDs to handle whitespace and formatting differences
- Documents can be listed and deleted consistently even if stored with slightly different tenant_id formatting
- The system automatically matches tenant IDs after normalization, ensuring operations work across different input formats
HTTP Endpoint Support
- Added GET support for
/rag/listendpoint (previously POST-only) - Added DELETE support for
/rag/delete/{document_id}and/rag/delete-allendpoints - All endpoints support both MCP protocol (POST with JSON payload) and direct HTTP methods (GET/DELETE with query parameters)
Response Format
- MCP server responses are wrapped in a standard format with
status,data, andmetadatafields - RAG client automatically unwraps responses for seamless integration
- Error responses include detailed messages for better debugging
RAG Search Enhancements
- Cross-Encoder Re-ranking: Two-stage retrieval process for massive accuracy improvement:
- Initial vector search retrieves top candidates using embeddings
- Cross-encoder model (
cross-encoder/ms-marco-MiniLM-L-6-v2) re-ranks top 10 results - Final filtering by threshold and limit applied
- Seamlessly integrated with existing search API
- Lowered default threshold from 0.5 to 0.3 for improved recall of relevant documents
- Intelligent fallback mechanism returns the top result even if similarity score is below threshold, ensuring knowledge base content is always accessible
- Configurable threshold via
thresholdparameter in search requests (default: 0.3) - Enhanced tool selection automatically triggers RAG for admin questions, fact lookups ("who is", "what is"), and internal knowledge queries
- Response unwrapping in MCP client ensures orchestrator receives properly formatted results for tool scoring and prompt building
Conversation Memory System
- Short-Term Memory: Automatic storage of tool outputs per session with configurable size limits (default: 10 outputs) and TTL (default: 900 seconds / 15 minutes)
- Session-Based Isolation: Memory is keyed by
session_id(nottenant_id) for safety, ensuring no cross-tenant data mixing - Automatic Injection: Recent memory is automatically injected into tool payloads as a
memoryfield, enabling tools to make context-aware decisions in multi-step workflows - Auto-Expiration: Memory entries automatically expire after TTL or can be explicitly cleared via
end_session/endSessionflag - Configuration: Tune behavior via environment variables:
MCP_MEMORY_MAX_ITEMS: Maximum number of tool outputs to keep per session (default: 10)MCP_MEMORY_TTL_SECONDS: Time-to-live for memory entries in seconds (default: 900)
- Comprehensive Testing: Conversation memory system tested through integration with agent orchestrator
AI-Generated KB Metadata
When ingesting documents, the system automatically extracts rich metadata:
- Title Extraction: From filename, URL, or content structure (with intelligent fallback)
- Summary Generation: 2-3 sentence summary via LLM (with keyword-based fallback)
- Tag Extraction: 5-8 relevant tags extracted from content
- Topic Identification: 3-5 main themes identified via LLM
- Date Detection: Multiple date formats automatically detected
- Quality Score: 0.0-1.0 score based on structure and completeness
Intelligent Fallback: When LLM is unavailable or times out, uses keyword extraction and pattern matching to provide useful metadata.
Database Integration: Metadata stored in JSONB column (metadata) for flexible querying and enhanced RAG search. Migration script: backend/scripts/migrate_add_metadata.py.
API Response: Ingestion endpoints (/rag/ingest-document, /rag/ingest-file) now return extracted_metadata in the response.
Per-Tool Latency Prediction & Context-Aware Routing
The agent now uses sophisticated routing logic to optimize tool selection:
- Latency Prediction: Agent estimates expected latency before tool selection:
- RAG: 60-120ms (depends on result count)
- Web: 400-1800ms (network-dependent)
- Admin: <20ms (local regex matching)
- LLM: Variable based on model and token count
- Path Optimization: Agent chooses fastest tool sequence based on latency estimates
- Context-Aware Routing: Intelligent tool skipping based on previous outputs:
- High RAG score (β₯0.8) β Skip web search
- Critical admin violation β Skip agent reasoning, immediate block
- Relevant memory available β Skip RAG, use memory instead
- Routing Hints: Context hints included in reasoning trace for transparency
Implementation: backend/api/services/tool_metadata.py defines latency estimates and routing logic. backend/api/services/tool_selector.py implements context-aware decisions.
Tool Output Schemas
Every tool now returns strict JSON schemas for consistency:
- RAG:
{results: [...], top_score: float, latency_ms: int} - Web:
{results: [...], latency_ms: int} - Admin:
{violations: [...], severity: str, latency_ms: int} - LLM:
{text: str, tokens_used: int, latency_ms: int}
Automatic Validation: All tool outputs validated and formatted in AgentOrchestrator before use. Makes debugging and monitoring simpler.
Schema Definitions: backend/api/services/tool_metadata.py contains TOOL_OUTPUT_SCHEMAS with validation functions.
UI & UX Improvements
- Document Display: Fixed document list formatting - now properly displays as table rows instead of
[object Object] - Rule Deletion: Enhanced to support both rule numbers and rule text for easier deletion
- LLM Enhancement Toggle: Added option to disable LLM enhancement for faster rule addition
- Timeout Improvements: Increased timeout for bulk rule operations from 45s to 180s
Role Propagation & Permission Handling
- Fixed Role Propagation: User role is now properly passed from API route β
process_ingestion()β RAG client β MCP serverrag_client.ingest_with_metadata()now acceptsuser_roleparameter- Role is included in payload sent to MCP server:
{"user_role": "owner", ...} - MCP server extracts role from payload via
build_tenant_context()and uses it for permission checks
- Improved Error Handling:
- Permission errors (403) return clear messages with actionable guidance
- Error messages specify which roles are allowed for each action
- Error messages specify which roles are allowed for each action
- Debug Logging: Added debug logging in route handlers and services to trace role values:
- Logs role received in route handler
- Logs role passed to process_ingestion
- Logs role sent to RAG client
- Logs role received by MCP server
- Admin Question Handling: Fixed admin identity questions to prioritize RAG results from knowledge base
UI Enhancements (app.py)
Knowledge Base Library Tab:
- Statistics cards showing document counts by type
- Interactive Plotly pie chart for document type distribution
- Semantic search with relevance scoring
- Type filtering (text, PDF, FAQ, link)
- Document management with preview and deletion
- Auto-refresh after operations
Admin Analytics Tab:
- Statistics cards for key metrics (queries, users, red flags, RAG searches)
- Interactive Plotly bar charts for tool usage, latency, and RAG quality
- Detailed tool usage table with performance metrics
- Formatted summary with dark theme styling
- Real-time data fetching and visualization
- Access: All roles can view analytics (viewer, editor, admin, owner)
Debug & Reasoning Tab:
- Reasoning trace analyzer showing step-by-step agent decision-making
- Tool invocation timeline with latency visualization
- AI metadata display after document ingestion (title, summary, tags, topics, quality score)
- Latency predictions shown in reasoning trace (estimated vs actual)
- Context-aware routing hints visualized (skip web/RAG/reasoning decisions)
- Tool output schemas displayed in debug view
- Formatted markdown output with detailed metrics
- Uses
/agent/debugendpoint for comprehensive insights
Modern UI/UX:
- Dark theme with white text for better readability
- Custom CSS styling for cards and charts
- Improved error handling and status messages
- Responsive layout with proper component scaling
LLM-Guided Rule Explanation
The rule enhancement system includes intelligent fallback mechanisms:
- LLM Enhancement: When available, rules are enhanced with comprehensive explanations, examples, missing patterns, edge cases, and improvements
- Intelligent Fallback: When LLM times out or fails, the system automatically generates basic explanations using keyword extraction:
- Detects keywords (password, API key, credit card, sensitive data, etc.)
- Generates contextual explanations based on detected keywords
- Provides relevant examples (5-8 examples) based on rule type
- Suggests missing patterns (3-5 suggestions) for rule improvement
- Timeout Protection: 30-second timeout per rule with graceful fallback
- Chunk Processing: Bulk rule processing handles failures gracefully - one rule failure doesn't block others
This ensures users always receive useful rule explanations even when the LLM service is unavailable or slow.
Real-Time Visualization Components (Gradio UI)
The Gradio UI includes powerful visualization components:
- Reasoning Path Visualizer: Step-by-step visualization of agent reasoning with status indicators and detailed metrics. Available in Debug & Reasoning tab.
- Tool Invocation Timeline: Visual timeline showing tool execution order, latency, and result counts. Available in Debug & Reasoning tab.
- Analytics Dashboard: Query activity and per-tool usage trends. Available in Admin Analytics tab.
All visualizations automatically populate when agent responses include reasoning_trace and tool_traces data.
Context Engineering (Latest)
The system implements comprehensive context engineering strategies based on Anthropic's best practices:
ContextEngineer Service (
backend/api/services/context_engineer.py):- ContextScratchpad: Structured note-taking with objectives, architectural decisions, and unresolved issues
- ContextCompressor: High-fidelity compaction and tool result clearing
- ContextSelector: Just-in-time context loading and memory selection
- ContextIsolator: Isolation of large tool outputs
Compaction Strategy:
- Monitors token usage and compresses at 80% threshold
- Uses tool result clearing first (safest), then full compaction
- Preserves architectural decisions, unresolved issues, and implementation details
- Targets 60% token usage after compression
Structured Prompts:
- All prompts use XML-style sections (
<system>,<background_information>,<instructions>) - Clear organization improves model understanding
- Better separation of concerns
- All prompts use XML-style sections (
Integration Points:
- Conversation history compression in
agent_orchestrator.py - Tool output compression for RAG and web search
- Structured scratchpad context in all prompts
- Memory selection before tool selection
- Conversation history compression in
Benefits:
- Reduced token usage and API costs
- Support for longer conversations
- Better agent coherence across extended interactions
- Improved performance through structured context
Context engineering features are integrated throughout the agent orchestrator and MCP server.
Environment Variables (excerpt)
Defined in env.example:
RAG_MCP_URL- Default:http://localhost:8900/rag(unified MCP server)WEB_MCP_URL- Default:http://localhost:8900/web(unified MCP server for Google web search)ADMIN_MCP_URL- Default:http://localhost:8900/admin(unified MCP server)MCP_PORT- Port for unified MCP server (default: 8900)MCP_HOST- Host for unified MCP server (default: 0.0.0.0)POSTGRESQL_URL- PostgreSQL connection string with pgvector extensionOLLAMA_URL,OLLAMA_MODEL(orGROQ_API_KEY+LLM_BACKEND=groq)SUPABASE_URL,SUPABASE_SERVICE_KEY- Required for Supabase backend (admin rules + analytics)- If not set, the system automatically falls back to SQLite in
data/directory
- If not set, the system automatically falls back to SQLite in
GOOGLE_SEARCH_API_KEY,GOOGLE_SEARCH_CX_ID- Credentials for Google Programmable Search used byweb.searchMCP_MEMORY_MAX_ITEMS- Maximum number of tool outputs to keep per session (default: 10)MCP_MEMORY_TTL_SECONDS- Time-to-live for memory entries in seconds (default: 900)APP_ENV,LOG_LEVEL,API_PORT
Update these before starting the servers to ensure the agent can reach every MCP endpoint and LLM runtime.
Note: The unified MCP server runs on a single port (default 8900) and handles all namespaced tools. The start.bat script automatically configures the correct URLs.
Supabase Configuration
Both RulesStore and AnalyticsStore support dual-backend storage with automatic detection:
Setup Steps
Create Supabase tables:
- Run
supabase_admin_rules_table.sqlin Supabase SQL Editor (from repo root) - Run
supabase_analytics_tables.sqlin Supabase SQL Editor (from repo root)
- Run
Configure environment variables in
.env:SUPABASE_URL=https://your-project-id.supabase.co SUPABASE_SERVICE_KEY=your_service_role_key_hereVerify configuration: Check that your Supabase project is accessible and tables are created correctly.
Migrate existing data (if you have SQLite data): Use Supabase migration tools or database export/import methods.
How It Works
- Automatic Detection: Both stores check for
SUPABASE_URLandSUPABASE_SERVICE_KEYat initialization - Supabase First: If credentials are found, Supabase is used automatically
- SQLite Fallback: If Supabase is not configured, SQLite databases in
data/are used - Startup Logging: Check startup logs to see which backend each store is using:
β RulesStore: Using Supabase backendβ AnalyticsStore: Using Supabase backend- Or
β οΈ RulesStore: Using SQLite backendif Supabase is not configured
Tables Used
- Admin Rules:
admin_rulestable in Supabase - Analytics:
tool_usage_events,redflag_violations,rag_search_events,agent_query_events
For detailed Supabase setup instructions, refer to the Supabase documentation and ensure tables are created correctly.
Unified MCP tool instructions
Agents that speak the Model Context Protocol should connect to the integrachat server id defined in backend/mcp_server/server.py and call the namespaced tools directly:
| Namespace | Tool | Purpose | HTTP Endpoint |
|---|---|---|---|
rag |
search |
Retrieve tenant-scoped document chunks | POST /rag/search |
rag |
ingest |
Chunk + store new knowledge | POST /rag/ingest |
rag |
list |
List all documents for tenant | GET /rag/list?tenant_id={id} |
rag |
delete |
Remove one/all stored documents | DELETE /rag/delete/{id}?tenant_id={id} or DELETE /rag/delete-all?tenant_id={id} |
web |
search |
Google Programmable Search (Custom Search API) | POST /web/search |
admin |
getRules |
Fetch tenant governance rules (list or detailed) | POST /admin/getRules |
admin |
addRule |
Insert or update a rule | POST /admin/addRule |
admin |
deleteRule |
Remove a rule by text | POST /admin/deleteRule |
admin |
logViolation |
Persist a red-flag event into analytics | POST /admin/logViolation |
Important Notes:
- Always send
tenant_idin the payload (or as query parameter for GET/DELETE requests) so the shared middleware can enforce isolation and log analytics - The MCP server automatically normalizes tenant IDs to ensure consistent matching across operations
- All endpoints support both POST (with JSON payload) and direct HTTP methods (GET for list, DELETE for delete operations)
- Tenant ID normalization handles whitespace and ensures documents can be listed and deleted consistently
- RAG search uses a default threshold of 0.3 for better recall; adjust via
thresholdparameter if needed - Conversation Memory: Send
session_id(orsessionId/conversation_id/conversationId) in tool payloads to enable short-term memory. Recent tool outputs are automatically stored and injected into subsequent tool calls as amemoryfield. Sendend_session: trueto clear memory for a session.
Troubleshooting
RAG Search Not Returning Results
- Check similarity threshold: The default threshold is 0.3. If results are still not found, try lowering it to 0.2 or 0.1
- Verify documents are ingested: Use
GET /rag/list?tenant_id={id}to confirm documents exist for the tenant - Check tenant ID matching: Ensure the tenant_id used for search matches the one used for ingestion (normalization handles whitespace automatically)
- Review search logs: Check MCP server logs for search metrics (hits_count, avg_score, top_score)
Agent Not Using RAG for Knowledge Base Questions
- Verify RAG results are being found: Check the agent debug endpoint (
POST /agent/debug) to see if RAG results are being pre-fetched - Check tool scores: The debug output shows
rag_fitnessscore; if it's low (< 0.4), the agent may skip RAG - Ensure knowledge base content exists: Questions like "who is the admin" require relevant content in the knowledge base
- Pattern matching: The tool selector automatically triggers RAG for patterns like "admin", "who is", "what is", but semantic similarity also plays a role
Document Ingestion Permission Errors
- 403 Forbidden: If you see "Role 'viewer' is not permitted to perform 'ingest_documents'":
- Ensure the
x-user-roleheader is set to "editor", "admin", or "owner" - Verify
x-user-roleheader is being sent correctly (check backend logs forπ DEBUG:messages) - Check that role is being propagated through the pipeline: route handler β process_ingestion β RAG client β MCP server
- Review debug logs to see where role might be getting lost or defaulting to "viewer"
- Ensure the
- Role Propagation: The role must flow through:
- Client sends
x-user-roleheader - Route handler receives it as
x_user_roleparameter - Route handler passes it to
process_ingestion(user_role=...) process_ingestionpasses it torag_client.ingest_with_metadata(user_role=...)- RAG client includes it in payload:
{"user_role": "...", ...} - MCP server extracts it via
build_tenant_context()and uses for permission checks
- Client sends
Document Deletion Issues
- 404 Not Found: Verify the document_id exists and belongs to the correct tenant
- Tenant ID mismatch: The system normalizes tenant IDs, but ensure you're using the same tenant_id format as when documents were ingested
- Check logs: Database deletion logs show detailed information about tenant ID matching and document existence
- Role Propagation: Ensure user role is being passed correctly - deletion requires
adminorownerrole. The role is now properly propagated from API request β API β RAG Client β MCP Server
Rule Management Issues
- Timeout Errors: If rule enhancement times out:
- Disable LLM enhancement by not setting
enhance=truein the request - Add rules in smaller batches (1-3 rules at a time)
- Enhancement timeout increased to 180s per chunk (5 rules)
- Disable LLM enhancement by not setting
- Rule Deletion 404: Can now delete by rule number or full text. If using number, ensure it's a valid index (1-based)
- Permission Errors: Rule management requires
adminorownerrole. Check that role is set correctly in thex-user-roleheader
Supabase Configuration Issues
- Data still going to SQLite: Check that
SUPABASE_URLandSUPABASE_SERVICE_KEYare set correctly in.env(no quotes, no spaces) - Service role key errors: Make sure you're using the service_role key (not anon key) from Supabase Dashboard β Settings β API
- Tables don't exist: Run
supabase_admin_rules_table.sqlandsupabase_analytics_tables.sqlin Supabase SQL Editor - Permission errors: Check RLS policies in Supabase allow service role access
- Startup warnings: Check FastAPI startup logs to see which backend each store is using (
βfor Supabase,β οΈfor SQLite fallback)