Spaces:

nothingworry
/

IntegraChat

Sleeping

App Files Files Community

IntegraChat / backend /README.md

nothingworry

chore: clean up README files after repository cleanup

93e2b71 15 days ago

preview code

raw

history blame

25.1 kB

	# Backend Documentation

	This folder contains the production-ready FastAPI stack plus the companion MCP servers that power IntegraChat.

	## Directory Overview

	- `api/` – FastAPI application (routes, services, storage helpers, MCP clients)
	- `mcp_server/` – Unified MCP server exposing rag/web/admin tools via namespaces
	- `workers/` – Celery workers and schedulers for async ingestion + analytics maintenance

	## Prerequisites

	- Python 3.10+
	- PostgreSQL (with the `vector` extension) for RAG data, or Supabase with pgvector enabled
	- Supabase (recommended) for admin rules + analytics storage, with automatic SQLite fallback in `data/`
	- Both `RulesStore` and `AnalyticsStore` automatically detect and use Supabase when configured
	- Falls back to SQLite automatically if Supabase credentials are missing
	- See `SUPABASE_SETUP.md` in the root directory for setup instructions
	- Optional: Ollama running locally (default) or Groq API credentials for remote LLMs

	Create a virtual environment at the repo root, then:

	```bash
	pip install -r requirements.txt
	cp env.example .env # update MCP URLs + LLM settings
	```

	## Running the Services Locally

	1. FastAPI core
	```bash
	uvicorn backend.api.main:app --port 8000 --reload
	```

	2. Unified MCP server (rag/web/admin)
	```bash
	python backend/mcp_server/server.py
	```
	Or use the provided startup script:
	```bash
	start.bat # Windows - launches MCP server on port 8900 and FastAPI on port 8000
	```

	This single server (default port 8900) exposes the following namespaced tools:
	- `rag.search` - Semantic search across tenant documents
	- `rag.ingest` - Ingest text content into knowledge base
	- `rag.delete` - Delete individual or all documents for a tenant
	- `rag.list` - List all documents for a tenant with pagination
	- `web.search` - Google Programmable Search (Custom Search API) web search
	- `admin.getRules`, `admin.addRule`, `admin.deleteRule`, `admin.logViolation`

	HTTP Endpoints (for direct API access):
	- `GET /rag/list?tenant_id={id}&limit={n}&offset={n}` - List documents
	- `POST /rag/ingest` - Ingest content
	- `POST /rag/search` - Search documents (supports `threshold` parameter, default: 0.3)
	- `DELETE /rag/delete/{document_id}?tenant_id={id}` - Delete specific document
	- `DELETE /rag/delete-all?tenant_id={id}` - Delete all documents
	- `POST /web/search` - Web search
	- `POST /admin/*` - Admin operations

	3. Optional workers (if running Celery-based ingestion/analytics jobs):
	```bash
	celery -A backend.workers.ingestion_worker worker --loglevel=info
	celery -A backend.workers.analytics_worker worker --loglevel=info
	```

	The Gradio UI (`python app.py`) talks to the FastAPI layer at `http://localhost:8000`.

	## Key Endpoints

	All endpoints require the `x-tenant-id` header unless otherwise noted.

	\| Service \| Path \| Notes \|
	\| --- \| --- \| --- \|
	\| Agent \| `POST /agent/message` \| Autonomous orchestration (RAG/Web/Admin/LLM) \|
	\| Agent Debug \| `POST /agent/debug` \| Full reasoning trace + tool plan \|
	\| Agent Plan \| `POST /agent/plan` \| Dry-run planning without executing tools \|
	\| RAG \| `POST /rag/ingest-document` \| Rich ingestion (text, URL, metadata) \|
	\| RAG \| `POST /rag/ingest-file` \| File upload (PDF/DOCX/TXT/MD) \|
	\| RAG \| `GET /rag/list` \| Paginated document listing per tenant (requires `x-tenant-id` header) \|
	\| RAG \| `DELETE /rag/delete/{document_id}` \| Delete specific document (requires `x-tenant-id` header) \|
	\| RAG \| `DELETE /rag/delete-all` \| Delete all documents for tenant (requires `x-tenant-id` header) \|
	\| Admin \| `POST /admin/rules` \| Regex + severity rule ingestion \|
	\| Analytics \| `GET /analytics/overview` \| Summary metrics (queries, tokens, red flags) \|

	Refer to the root `README.md` for the complete endpoint tables.

	## Diagnostics & Tenant Isolation

	When validating backend changes:

	- API Testing: Use the FastAPI interactive docs at `http://localhost:8000/docs` to test endpoints
	- Database Inspection: Connect directly to your PostgreSQL/Supabase instance to verify tenant isolation and check that documents are tagged with the correct `tenant_id`
	- Log Monitoring: Check FastAPI and MCP server logs for detailed error messages and debugging information

	> Troubleshooting tip: If you suspect tenant isolation issues, check database queries to confirm they include `WHERE tenant_id = ...` filters, then restart the unified MCP server so it reloads the updated SQL filtering logic.

	## Recent Improvements

	### Tenant ID Normalization
	- All database operations now normalize tenant IDs to handle whitespace and formatting differences
	- Documents can be listed and deleted consistently even if stored with slightly different tenant_id formatting
	- The system automatically matches tenant IDs after normalization, ensuring operations work across different input formats

	### HTTP Endpoint Support
	- Added GET support for `/rag/list` endpoint (previously POST-only)
	- Added DELETE support for `/rag/delete/{document_id}` and `/rag/delete-all` endpoints
	- All endpoints support both MCP protocol (POST with JSON payload) and direct HTTP methods (GET/DELETE with query parameters)

	### Response Format
	- MCP server responses are wrapped in a standard format with `status`, `data`, and `metadata` fields
	- RAG client automatically unwraps responses for seamless integration
	- Error responses include detailed messages for better debugging

	### RAG Search Enhancements
	- Cross-Encoder Re-ranking: Two-stage retrieval process for massive accuracy improvement:
	- Initial vector search retrieves top candidates using embeddings
	- Cross-encoder model (`cross-encoder/ms-marco-MiniLM-L-6-v2`) re-ranks top 10 results
	- Final filtering by threshold and limit applied
	- Seamlessly integrated with existing search API
	- Lowered default threshold from 0.5 to 0.3 for improved recall of relevant documents
	- Intelligent fallback mechanism returns the top result even if similarity score is below threshold, ensuring knowledge base content is always accessible
	- Configurable threshold via `threshold` parameter in search requests (default: 0.3)
	- Enhanced tool selection automatically triggers RAG for admin questions, fact lookups ("who is", "what is"), and internal knowledge queries
	- Response unwrapping in MCP client ensures orchestrator receives properly formatted results for tool scoring and prompt building

	### Conversation Memory System
	- Short-Term Memory: Automatic storage of tool outputs per session with configurable size limits (default: 10 outputs) and TTL (default: 900 seconds / 15 minutes)
	- Session-Based Isolation: Memory is keyed by `session_id` (not `tenant_id`) for safety, ensuring no cross-tenant data mixing
	- Automatic Injection: Recent memory is automatically injected into tool payloads as a `memory` field, enabling tools to make context-aware decisions in multi-step workflows
	- Auto-Expiration: Memory entries automatically expire after TTL or can be explicitly cleared via `end_session`/`endSession` flag
	- Configuration: Tune behavior via environment variables:
	- `MCP_MEMORY_MAX_ITEMS`: Maximum number of tool outputs to keep per session (default: 10)
	- `MCP_MEMORY_TTL_SECONDS`: Time-to-live for memory entries in seconds (default: 900)
	- Comprehensive Testing: Conversation memory system tested through integration with agent orchestrator

	### AI-Generated KB Metadata

	When ingesting documents, the system automatically extracts rich metadata:

	- Title Extraction: From filename, URL, or content structure (with intelligent fallback)
	- Summary Generation: 2-3 sentence summary via LLM (with keyword-based fallback)
	- Tag Extraction: 5-8 relevant tags extracted from content
	- Topic Identification: 3-5 main themes identified via LLM
	- Date Detection: Multiple date formats automatically detected
	- Quality Score: 0.0-1.0 score based on structure and completeness

	Intelligent Fallback: When LLM is unavailable or times out, uses keyword extraction and pattern matching to provide useful metadata.

	Database Integration: Metadata stored in JSONB column (`metadata`) for flexible querying and enhanced RAG search. Migration script: `backend/scripts/migrate_add_metadata.py`.

	API Response: Ingestion endpoints (`/rag/ingest-document`, `/rag/ingest-file`) now return `extracted_metadata` in the response.

	### Per-Tool Latency Prediction & Context-Aware Routing

	The agent now uses sophisticated routing logic to optimize tool selection:

	- Latency Prediction: Agent estimates expected latency before tool selection:
	- RAG: 60-120ms (depends on result count)
	- Web: 400-1800ms (network-dependent)
	- Admin: <20ms (local regex matching)
	- LLM: Variable based on model and token count
	- Path Optimization: Agent chooses fastest tool sequence based on latency estimates
	- Context-Aware Routing: Intelligent tool skipping based on previous outputs:
	- High RAG score (≥0.8) → Skip web search
	- Critical admin violation → Skip agent reasoning, immediate block
	- Relevant memory available → Skip RAG, use memory instead
	- Routing Hints: Context hints included in reasoning trace for transparency

	Implementation: `backend/api/services/tool_metadata.py` defines latency estimates and routing logic. `backend/api/services/tool_selector.py` implements context-aware decisions.

	### Tool Output Schemas

	Every tool now returns strict JSON schemas for consistency:

	- RAG: `{results: [...], top_score: float, latency_ms: int}`
	- Web: `{results: [...], latency_ms: int}`
	- Admin: `{violations: [...], severity: str, latency_ms: int}`
	- LLM: `{text: str, tokens_used: int, latency_ms: int}`

	Automatic Validation: All tool outputs validated and formatted in `AgentOrchestrator` before use. Makes debugging and monitoring simpler.

	Schema Definitions: `backend/api/services/tool_metadata.py` contains `TOOL_OUTPUT_SCHEMAS` with validation functions.

	### UI & UX Improvements
	- Document Display: Fixed document list formatting - now properly displays as table rows instead of `[object Object]`
	- Rule Deletion: Enhanced to support both rule numbers and rule text for easier deletion
	- LLM Enhancement Toggle: Added option to disable LLM enhancement for faster rule addition
	- Timeout Improvements: Increased timeout for bulk rule operations from 45s to 180s

	### Role Propagation & Permission Handling
	- Fixed Role Propagation: User role is now properly passed from API route → `process_ingestion()` → RAG client → MCP server
	- `rag_client.ingest_with_metadata()` now accepts `user_role` parameter
	- Role is included in payload sent to MCP server: `{"user_role": "owner", ...}`
	- MCP server extracts role from payload via `build_tenant_context()` and uses it for permission checks
	- Improved Error Handling:
	- Permission errors (403) return clear messages with actionable guidance
	- Error messages specify which roles are allowed for each action
	- Error messages specify which roles are allowed for each action
	- Debug Logging: Added debug logging in route handlers and services to trace role values:
	- Logs role received in route handler
	- Logs role passed to process_ingestion
	- Logs role sent to RAG client
	- Logs role received by MCP server
	- Admin Question Handling: Fixed admin identity questions to prioritize RAG results from knowledge base

	### UI Enhancements (app.py)
	- Knowledge Base Library Tab:
	- Statistics cards showing document counts by type
	- Interactive Plotly pie chart for document type distribution
	- Semantic search with relevance scoring
	- Type filtering (text, PDF, FAQ, link)
	- Document management with preview and deletion
	- Auto-refresh after operations

	- Admin Analytics Tab:
	- Statistics cards for key metrics (queries, users, red flags, RAG searches)
	- Interactive Plotly bar charts for tool usage, latency, and RAG quality
	- Detailed tool usage table with performance metrics
	- Formatted summary with dark theme styling
	- Real-time data fetching and visualization
	- Access: All roles can view analytics (viewer, editor, admin, owner)

	- Debug & Reasoning Tab:
	- Reasoning trace analyzer showing step-by-step agent decision-making
	- Tool invocation timeline with latency visualization
	- AI metadata display after document ingestion (title, summary, tags, topics, quality score)
	- Latency predictions shown in reasoning trace (estimated vs actual)
	- Context-aware routing hints visualized (skip web/RAG/reasoning decisions)
	- Tool output schemas displayed in debug view
	- Formatted markdown output with detailed metrics
	- Uses `/agent/debug` endpoint for comprehensive insights

	- Modern UI/UX:
	- Dark theme with white text for better readability
	- Custom CSS styling for cards and charts
	- Improved error handling and status messages
	- Responsive layout with proper component scaling

	### LLM-Guided Rule Explanation

	The rule enhancement system includes intelligent fallback mechanisms:

	- LLM Enhancement: When available, rules are enhanced with comprehensive explanations, examples, missing patterns, edge cases, and improvements
	- Intelligent Fallback: When LLM times out or fails, the system automatically generates basic explanations using keyword extraction:
	- Detects keywords (password, API key, credit card, sensitive data, etc.)
	- Generates contextual explanations based on detected keywords
	- Provides relevant examples (5-8 examples) based on rule type
	- Suggests missing patterns (3-5 suggestions) for rule improvement
	- Timeout Protection: 30-second timeout per rule with graceful fallback
	- Chunk Processing: Bulk rule processing handles failures gracefully - one rule failure doesn't block others

	This ensures users always receive useful rule explanations even when the LLM service is unavailable or slow.

	### Real-Time Visualization Components (Gradio UI)

	The Gradio UI includes powerful visualization components:

	- Reasoning Path Visualizer: Step-by-step visualization of agent reasoning with status indicators and detailed metrics. Available in Debug & Reasoning tab.
	- Tool Invocation Timeline: Visual timeline showing tool execution order, latency, and result counts. Available in Debug & Reasoning tab.
	- Analytics Dashboard: Query activity and per-tool usage trends. Available in Admin Analytics tab.

	All visualizations automatically populate when agent responses include `reasoning_trace` and `tool_traces` data.

	### Context Engineering (Latest)

	The system implements comprehensive context engineering strategies based on Anthropic's best practices:

	- ContextEngineer Service (`backend/api/services/context_engineer.py`):
	- ContextScratchpad: Structured note-taking with objectives, architectural decisions, and unresolved issues
	- ContextCompressor: High-fidelity compaction and tool result clearing
	- ContextSelector: Just-in-time context loading and memory selection
	- ContextIsolator: Isolation of large tool outputs

	- Compaction Strategy:
	- Monitors token usage and compresses at 80% threshold
	- Uses tool result clearing first (safest), then full compaction
	- Preserves architectural decisions, unresolved issues, and implementation details
	- Targets 60% token usage after compression

	- Structured Prompts:
	- All prompts use XML-style sections (`<system>`, `<background_information>`, `<instructions>`)
	- Clear organization improves model understanding
	- Better separation of concerns

	- Integration Points:
	- Conversation history compression in `agent_orchestrator.py`
	- Tool output compression for RAG and web search
	- Structured scratchpad context in all prompts
	- Memory selection before tool selection

	- Benefits:
	- Reduced token usage and API costs
	- Support for longer conversations
	- Better agent coherence across extended interactions
	- Improved performance through structured context

	Context engineering features are integrated throughout the agent orchestrator and MCP server.

	## Environment Variables (excerpt)

	Defined in `env.example`:

	- `RAG_MCP_URL` - Default: `http://localhost:8900/rag` (unified MCP server)
	- `WEB_MCP_URL` - Default: `http://localhost:8900/web` (unified MCP server for Google web search)
	- `ADMIN_MCP_URL` - Default: `http://localhost:8900/admin` (unified MCP server)
	- `MCP_PORT` - Port for unified MCP server (default: 8900)
	- `MCP_HOST` - Host for unified MCP server (default: 0.0.0.0)
	- `POSTGRESQL_URL` - PostgreSQL connection string with pgvector extension
	- `OLLAMA_URL`, `OLLAMA_MODEL` (or `GROQ_API_KEY` + `LLM_BACKEND=groq`)
	- `SUPABASE_URL`, `SUPABASE_SERVICE_KEY` - Required for Supabase backend (admin rules + analytics)
	- If not set, the system automatically falls back to SQLite in `data/` directory
	- `GOOGLE_SEARCH_API_KEY`, `GOOGLE_SEARCH_CX_ID` - Credentials for Google Programmable Search used by `web.search`
	- `MCP_MEMORY_MAX_ITEMS` - Maximum number of tool outputs to keep per session (default: 10)
	- `MCP_MEMORY_TTL_SECONDS` - Time-to-live for memory entries in seconds (default: 900)
	- `APP_ENV`, `LOG_LEVEL`, `API_PORT`

	Update these before starting the servers to ensure the agent can reach every MCP endpoint and LLM runtime.

	Note: The unified MCP server runs on a single port (default 8900) and handles all namespaced tools. The `start.bat` script automatically configures the correct URLs.

	## Supabase Configuration

	Both `RulesStore` and `AnalyticsStore` support dual-backend storage with automatic detection:

	### Setup Steps

	1. Create Supabase tables:
	- Run `supabase_admin_rules_table.sql` in Supabase SQL Editor (from repo root)
	- Run `supabase_analytics_tables.sql` in Supabase SQL Editor (from repo root)

	2. Configure environment variables in `.env`:
	```env
	SUPABASE_URL=https://your-project-id.supabase.co
	SUPABASE_SERVICE_KEY=your_service_role_key_here
	```

	3. Verify configuration: Check that your Supabase project is accessible and tables are created correctly.

	4. Migrate existing data (if you have SQLite data): Use Supabase migration tools or database export/import methods.

	### How It Works

	- Automatic Detection: Both stores check for `SUPABASE_URL` and `SUPABASE_SERVICE_KEY` at initialization
	- Supabase First: If credentials are found, Supabase is used automatically
	- SQLite Fallback: If Supabase is not configured, SQLite databases in `data/` are used
	- Startup Logging: Check startup logs to see which backend each store is using:
	- `✅ RulesStore: Using Supabase backend`
	- `✅ AnalyticsStore: Using Supabase backend`
	- Or `⚠️ RulesStore: Using SQLite backend` if Supabase is not configured

	### Tables Used

	- Admin Rules: `admin_rules` table in Supabase
	- Analytics: `tool_usage_events`, `redflag_violations`, `rag_search_events`, `agent_query_events`

	For detailed Supabase setup instructions, refer to the Supabase documentation and ensure tables are created correctly.

	## Unified MCP tool instructions

	Agents that speak the Model Context Protocol should connect to the `integrachat` server id defined in `backend/mcp_server/server.py` and call the namespaced tools directly:

	\| Namespace \| Tool \| Purpose \| HTTP Endpoint \|
	\| --- \| --- \| --- \| --- \|
	\| `rag` \| `search` \| Retrieve tenant-scoped document chunks \| `POST /rag/search` \|
	\| `rag` \| `ingest` \| Chunk + store new knowledge \| `POST /rag/ingest` \|
	\| `rag` \| `list` \| List all documents for tenant \| `GET /rag/list?tenant_id={id}` \|
	\| `rag` \| `delete` \| Remove one/all stored documents \| `DELETE /rag/delete/{id}?tenant_id={id}` or `DELETE /rag/delete-all?tenant_id={id}` \|
	\| `web` \| `search` \| Google Programmable Search (Custom Search API) \| `POST /web/search` \|
	\| `admin` \| `getRules` \| Fetch tenant governance rules (list or detailed) \| `POST /admin/getRules` \|
	\| `admin` \| `addRule` \| Insert or update a rule \| `POST /admin/addRule` \|
	\| `admin` \| `deleteRule` \| Remove a rule by text \| `POST /admin/deleteRule` \|
	\| `admin` \| `logViolation` \| Persist a red-flag event into analytics \| `POST /admin/logViolation` \|

	Important Notes:
	- Always send `tenant_id` in the payload (or as query parameter for GET/DELETE requests) so the shared middleware can enforce isolation and log analytics
	- The MCP server automatically normalizes tenant IDs to ensure consistent matching across operations
	- All endpoints support both POST (with JSON payload) and direct HTTP methods (GET for list, DELETE for delete operations)
	- Tenant ID normalization handles whitespace and ensures documents can be listed and deleted consistently
	- RAG search uses a default threshold of 0.3 for better recall; adjust via `threshold` parameter if needed
	- Conversation Memory: Send `session_id` (or `sessionId`/`conversation_id`/`conversationId`) in tool payloads to enable short-term memory. Recent tool outputs are automatically stored and injected into subsequent tool calls as a `memory` field. Send `end_session: true` to clear memory for a session.

	## Troubleshooting

	### RAG Search Not Returning Results
	- Check similarity threshold: The default threshold is 0.3. If results are still not found, try lowering it to 0.2 or 0.1
	- Verify documents are ingested: Use `GET /rag/list?tenant_id={id}` to confirm documents exist for the tenant
	- Check tenant ID matching: Ensure the tenant_id used for search matches the one used for ingestion (normalization handles whitespace automatically)
	- Review search logs: Check MCP server logs for search metrics (hits_count, avg_score, top_score)

	### Agent Not Using RAG for Knowledge Base Questions
	- Verify RAG results are being found: Check the agent debug endpoint (`POST /agent/debug`) to see if RAG results are being pre-fetched
	- Check tool scores: The debug output shows `rag_fitness` score; if it's low (< 0.4), the agent may skip RAG
	- Ensure knowledge base content exists: Questions like "who is the admin" require relevant content in the knowledge base
	- Pattern matching: The tool selector automatically triggers RAG for patterns like "admin", "who is", "what is", but semantic similarity also plays a role

	### Document Ingestion Permission Errors
	- 403 Forbidden: If you see "Role 'viewer' is not permitted to perform 'ingest_documents'":
	- Ensure the `x-user-role` header is set to "editor", "admin", or "owner"
	- Verify `x-user-role` header is being sent correctly (check backend logs for `🔍 DEBUG:` messages)
	- Check that role is being propagated through the pipeline: route handler → process_ingestion → RAG client → MCP server
	- Review debug logs to see where role might be getting lost or defaulting to "viewer"
	- Role Propagation: The role must flow through:
	1. Client sends `x-user-role` header
	2. Route handler receives it as `x_user_role` parameter
	3. Route handler passes it to `process_ingestion(user_role=...)`
	4. `process_ingestion` passes it to `rag_client.ingest_with_metadata(user_role=...)`
	5. RAG client includes it in payload: `{"user_role": "...", ...}`
	6. MCP server extracts it via `build_tenant_context()` and uses for permission checks

	### Document Deletion Issues
	- 404 Not Found: Verify the document_id exists and belongs to the correct tenant
	- Tenant ID mismatch: The system normalizes tenant IDs, but ensure you're using the same tenant_id format as when documents were ingested
	- Check logs: Database deletion logs show detailed information about tenant ID matching and document existence
	- Role Propagation: Ensure user role is being passed correctly - deletion requires `admin` or `owner` role. The role is now properly propagated from API request → API → RAG Client → MCP Server

	### Rule Management Issues
	- Timeout Errors: If rule enhancement times out:
	- Disable LLM enhancement by not setting `enhance=true` in the request
	- Add rules in smaller batches (1-3 rules at a time)
	- Enhancement timeout increased to 180s per chunk (5 rules)
	- Rule Deletion 404: Can now delete by rule number or full text. If using number, ensure it's a valid index (1-based)
	- Permission Errors: Rule management requires `admin` or `owner` role. Check that role is set correctly in the `x-user-role` header

	### Supabase Configuration Issues
	- Data still going to SQLite: Check that `SUPABASE_URL` and `SUPABASE_SERVICE_KEY` are set correctly in `.env` (no quotes, no spaces)
	- Service role key errors: Make sure you're using the service_role key (not anon key) from Supabase Dashboard → Settings → API
	- Tables don't exist: Run `supabase_admin_rules_table.sql` and `supabase_analytics_tables.sql` in Supabase SQL Editor
	- Permission errors: Check RLS policies in Supabase allow service role access
	- Startup warnings: Check FastAPI startup logs to see which backend each store is using (`✅` for Supabase, `⚠️` for SQLite fallback)