Spaces:
Sleeping
Sleeping
Commit
·
4c04529
1
Parent(s):
78b6d7b
update the Readme.md files
Browse files- README.md +23 -6
- backend/README.md +66 -17
- frontend/README.md +8 -3
README.md
CHANGED
|
@@ -48,8 +48,10 @@ This platform showcases how MCP can power intelligent, governed, multi-tenant AI
|
|
| 48 |
|
| 49 |
1. **Backend services running**:
|
| 50 |
- FastAPI API (`uvicorn backend.api.main:app --port 8000`)
|
| 51 |
-
- MCP
|
| 52 |
- Optional: Ollama / Groq credentials for the LLM client
|
|
|
|
|
|
|
| 53 |
2. **Python 3.10+** with the dependencies in `requirements.txt`
|
| 54 |
|
| 55 |
### Installation
|
|
@@ -122,11 +124,11 @@ Then open `http://localhost:3000`. The navbar links on the landing page route to
|
|
| 122 |
|
| 123 |
| Purpose | Method & Path | Description |
|
| 124 |
| --- | --- | --- |
|
| 125 |
-
| Ingest document | `POST /rag/ingest-document` | Accepts `source_type`, `content`, metadata (filename, URL, doc_id) |
|
| 126 |
| Ingest file | `POST /rag/ingest-file` | Multipart upload with `x-tenant-id` header (PDF/DOCX/TXT/MD) |
|
| 127 |
-
| List documents | `GET /rag/list` | Returns all documents for a tenant with pagination |
|
| 128 |
-
| Delete document | `DELETE /rag/delete/{document_id}` | Deletes a specific document by ID |
|
| 129 |
-
| Delete all documents | `DELETE /rag/delete-all` | Deletes all documents for a tenant |
|
| 130 |
|
| 131 |
### Admin & Governance Endpoints
|
| 132 |
|
|
@@ -235,7 +237,22 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
|
|
| 235 |
- **LLM Integration**: Ollama (local) or Groq (cloud) via configurable backend
|
| 236 |
- **Vector Store**: pgvector (via Supabase) or SQLite embeddings
|
| 237 |
- **Analytics**: SQLite with indexed queries for fast analytics
|
| 238 |
-
- **MCP
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 239 |
|
| 240 |
## Acknowledgments
|
| 241 |
|
|
|
|
| 48 |
|
| 49 |
1. **Backend services running**:
|
| 50 |
- FastAPI API (`uvicorn backend.api.main:app --port 8000`)
|
| 51 |
+
- Unified MCP server (port 8900) as described in `backend/README.md`
|
| 52 |
- Optional: Ollama / Groq credentials for the LLM client
|
| 53 |
+
|
| 54 |
+
**Quick Start**: Run `start.bat` (Windows) to launch all services automatically.
|
| 55 |
2. **Python 3.10+** with the dependencies in `requirements.txt`
|
| 56 |
|
| 57 |
### Installation
|
|
|
|
| 124 |
|
| 125 |
| Purpose | Method & Path | Description |
|
| 126 |
| --- | --- | --- |
|
| 127 |
+
| Ingest document | `POST /rag/ingest-document` | Accepts `source_type`, `content`, metadata (filename, URL, doc_id). Supports raw text, URLs, PDFs, DOCX, TXT, and Markdown files |
|
| 128 |
| Ingest file | `POST /rag/ingest-file` | Multipart upload with `x-tenant-id` header (PDF/DOCX/TXT/MD) |
|
| 129 |
+
| List documents | `GET /rag/list?tenant_id={id}&limit={n}&offset={n}` | Returns all documents for a tenant with pagination. Requires `x-tenant-id` header or `tenant_id` query parameter |
|
| 130 |
+
| Delete document | `DELETE /rag/delete/{document_id}?tenant_id={id}` | Deletes a specific document by ID. Requires `x-tenant-id` header or `tenant_id` query parameter |
|
| 131 |
+
| Delete all documents | `DELETE /rag/delete-all?tenant_id={id}` | Deletes all documents for a tenant. Requires `x-tenant-id` header or `tenant_id` query parameter |
|
| 132 |
|
| 133 |
### Admin & Governance Endpoints
|
| 134 |
|
|
|
|
| 237 |
- **LLM Integration**: Ollama (local) or Groq (cloud) via configurable backend
|
| 238 |
- **Vector Store**: pgvector (via Supabase) or SQLite embeddings
|
| 239 |
- **Analytics**: SQLite with indexed queries for fast analytics
|
| 240 |
+
- **MCP Server**: Unified MCP server (port 8900) exposing all tools via namespaces
|
| 241 |
+
- **Database**: PostgreSQL with pgvector extension for RAG embeddings, SQLite for analytics
|
| 242 |
+
|
| 243 |
+
## Key Technical Features
|
| 244 |
+
|
| 245 |
+
### Tenant Isolation & Normalization
|
| 246 |
+
- **Strict tenant isolation** enforced at database level with `WHERE tenant_id = ...` filters
|
| 247 |
+
- **Automatic tenant ID normalization** handles whitespace and formatting differences
|
| 248 |
+
- Documents can be listed and deleted consistently across different tenant_id formats
|
| 249 |
+
- All operations validate tenant ownership before execution
|
| 250 |
+
|
| 251 |
+
### MCP Server Architecture
|
| 252 |
+
- **Unified server** running on a single port (default 8900) for all namespaced tools
|
| 253 |
+
- **Dual protocol support**: Both MCP protocol (POST with JSON) and RESTful HTTP (GET/DELETE)
|
| 254 |
+
- **Response wrapping**: Standardized response format with automatic unwrapping in clients
|
| 255 |
+
- **Error handling**: Comprehensive error responses with detailed messages for debugging
|
| 256 |
|
| 257 |
## Acknowledgments
|
| 258 |
|
backend/README.md
CHANGED
|
@@ -33,10 +33,27 @@ cp env.example .env # update MCP URLs + LLM settings
|
|
| 33 |
```bash
|
| 34 |
python backend/mcp_server/server.py
|
| 35 |
```
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
- `admin.getRules`, `admin.addRule`, `admin.deleteRule`, `admin.logViolation`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
3. **Optional workers** (if running Celery-based ingestion/analytics jobs):
|
| 42 |
```bash
|
|
@@ -56,7 +73,10 @@ All endpoints require the `x-tenant-id` header unless otherwise noted.
|
|
| 56 |
| Agent Debug | `POST /agent/debug` | Full reasoning trace + tool plan |
|
| 57 |
| Agent Plan | `POST /agent/plan` | Dry-run planning without executing tools |
|
| 58 |
| RAG | `POST /rag/ingest-document` | Rich ingestion (text, URL, metadata) |
|
| 59 |
-
| RAG | `
|
|
|
|
|
|
|
|
|
|
| 60 |
| Admin | `POST /admin/rules` | Regex + severity rule ingestion |
|
| 61 |
| Analytics | `GET /analytics/overview` | Summary metrics (queries, tokens, red flags) |
|
| 62 |
|
|
@@ -72,31 +92,60 @@ Use the helper scripts in the repo root when validating backend changes:
|
|
| 72 |
|
| 73 |
> **Troubleshooting tip:** If the isolation script reports a failure, first run `check_rag_database.py` to confirm documents are tagged with the correct `tenant_id`, then restart the unified MCP server so it reloads the updated SQL filtering logic.
|
| 74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
## Environment Variables (excerpt)
|
| 76 |
|
| 77 |
Defined in `env.example`:
|
| 78 |
|
| 79 |
-
- `RAG_MCP_URL
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
- `OLLAMA_URL`, `OLLAMA_MODEL` (or `GROQ_API_KEY` + `LLM_BACKEND=groq`)
|
| 81 |
- `SUPABASE_URL`, `SUPABASE_SERVICE_KEY` (optional admin integrations)
|
| 82 |
- `APP_ENV`, `LOG_LEVEL`, `API_PORT`
|
| 83 |
|
| 84 |
Update these before starting the servers to ensure the agent can reach every MCP endpoint and LLM runtime.
|
| 85 |
|
|
|
|
|
|
|
| 86 |
## Unified MCP tool instructions
|
| 87 |
|
| 88 |
Agents that speak the Model Context Protocol should connect to the `integrachat` server id defined in `backend/mcp_server/server.py` and call the namespaced tools directly:
|
| 89 |
|
| 90 |
-
| Namespace | Tool | Purpose |
|
| 91 |
-
| --- | --- | --- |
|
| 92 |
-
| `rag` | `search` | Retrieve tenant-scoped document chunks |
|
| 93 |
-
| `rag` | `ingest` | Chunk + store new knowledge |
|
| 94 |
-
| `rag` | `
|
| 95 |
-
| `
|
| 96 |
-
| `
|
| 97 |
-
| `admin` | `
|
| 98 |
-
| `admin` | `
|
| 99 |
-
| `admin` | `
|
| 100 |
-
|
| 101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
|
|
|
|
| 33 |
```bash
|
| 34 |
python backend/mcp_server/server.py
|
| 35 |
```
|
| 36 |
+
Or use the provided startup script:
|
| 37 |
+
```bash
|
| 38 |
+
start.bat # Windows - launches MCP server on port 8900 and FastAPI on port 8000
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
This single server (default port 8900) exposes the following namespaced tools:
|
| 42 |
+
- `rag.search` - Semantic search across tenant documents
|
| 43 |
+
- `rag.ingest` - Ingest text content into knowledge base
|
| 44 |
+
- `rag.delete` - Delete individual or all documents for a tenant
|
| 45 |
+
- `rag.list` - List all documents for a tenant with pagination
|
| 46 |
+
- `web.search` - DuckDuckGo-based web search
|
| 47 |
- `admin.getRules`, `admin.addRule`, `admin.deleteRule`, `admin.logViolation`
|
| 48 |
+
|
| 49 |
+
**HTTP Endpoints** (for direct API access):
|
| 50 |
+
- `GET /rag/list?tenant_id={id}&limit={n}&offset={n}` - List documents
|
| 51 |
+
- `POST /rag/ingest` - Ingest content
|
| 52 |
+
- `POST /rag/search` - Search documents
|
| 53 |
+
- `DELETE /rag/delete/{document_id}?tenant_id={id}` - Delete specific document
|
| 54 |
+
- `DELETE /rag/delete-all?tenant_id={id}` - Delete all documents
|
| 55 |
+
- `POST /web/search` - Web search
|
| 56 |
+
- `POST /admin/*` - Admin operations
|
| 57 |
|
| 58 |
3. **Optional workers** (if running Celery-based ingestion/analytics jobs):
|
| 59 |
```bash
|
|
|
|
| 73 |
| Agent Debug | `POST /agent/debug` | Full reasoning trace + tool plan |
|
| 74 |
| Agent Plan | `POST /agent/plan` | Dry-run planning without executing tools |
|
| 75 |
| RAG | `POST /rag/ingest-document` | Rich ingestion (text, URL, metadata) |
|
| 76 |
+
| RAG | `POST /rag/ingest-file` | File upload (PDF/DOCX/TXT/MD) |
|
| 77 |
+
| RAG | `GET /rag/list` | Paginated document listing per tenant (requires `x-tenant-id` header) |
|
| 78 |
+
| RAG | `DELETE /rag/delete/{document_id}` | Delete specific document (requires `x-tenant-id` header) |
|
| 79 |
+
| RAG | `DELETE /rag/delete-all` | Delete all documents for tenant (requires `x-tenant-id` header) |
|
| 80 |
| Admin | `POST /admin/rules` | Regex + severity rule ingestion |
|
| 81 |
| Analytics | `GET /analytics/overview` | Summary metrics (queries, tokens, red flags) |
|
| 82 |
|
|
|
|
| 92 |
|
| 93 |
> **Troubleshooting tip:** If the isolation script reports a failure, first run `check_rag_database.py` to confirm documents are tagged with the correct `tenant_id`, then restart the unified MCP server so it reloads the updated SQL filtering logic.
|
| 94 |
|
| 95 |
+
## Recent Improvements
|
| 96 |
+
|
| 97 |
+
### Tenant ID Normalization
|
| 98 |
+
- All database operations now normalize tenant IDs to handle whitespace and formatting differences
|
| 99 |
+
- Documents can be listed and deleted consistently even if stored with slightly different tenant_id formatting
|
| 100 |
+
- The system automatically matches tenant IDs after normalization, ensuring operations work across different input formats
|
| 101 |
+
|
| 102 |
+
### HTTP Endpoint Support
|
| 103 |
+
- Added GET support for `/rag/list` endpoint (previously POST-only)
|
| 104 |
+
- Added DELETE support for `/rag/delete/{document_id}` and `/rag/delete-all` endpoints
|
| 105 |
+
- All endpoints support both MCP protocol (POST with JSON payload) and direct HTTP methods (GET/DELETE with query parameters)
|
| 106 |
+
|
| 107 |
+
### Response Format
|
| 108 |
+
- MCP server responses are wrapped in a standard format with `status`, `data`, and `metadata` fields
|
| 109 |
+
- RAG client automatically unwraps responses for seamless integration
|
| 110 |
+
- Error responses include detailed messages for better debugging
|
| 111 |
+
|
| 112 |
## Environment Variables (excerpt)
|
| 113 |
|
| 114 |
Defined in `env.example`:
|
| 115 |
|
| 116 |
+
- `RAG_MCP_URL` - Default: `http://localhost:8900/rag` (unified MCP server)
|
| 117 |
+
- `WEB_MCP_URL` - Default: `http://localhost:8900/web` (unified MCP server)
|
| 118 |
+
- `ADMIN_MCP_URL` - Default: `http://localhost:8900/admin` (unified MCP server)
|
| 119 |
+
- `MCP_PORT` - Port for unified MCP server (default: 8900)
|
| 120 |
+
- `MCP_HOST` - Host for unified MCP server (default: 0.0.0.0)
|
| 121 |
+
- `POSTGRESQL_URL` - PostgreSQL connection string with pgvector extension
|
| 122 |
- `OLLAMA_URL`, `OLLAMA_MODEL` (or `GROQ_API_KEY` + `LLM_BACKEND=groq`)
|
| 123 |
- `SUPABASE_URL`, `SUPABASE_SERVICE_KEY` (optional admin integrations)
|
| 124 |
- `APP_ENV`, `LOG_LEVEL`, `API_PORT`
|
| 125 |
|
| 126 |
Update these before starting the servers to ensure the agent can reach every MCP endpoint and LLM runtime.
|
| 127 |
|
| 128 |
+
**Note**: The unified MCP server runs on a single port (default 8900) and handles all namespaced tools. The `start.bat` script automatically configures the correct URLs.
|
| 129 |
+
|
| 130 |
## Unified MCP tool instructions
|
| 131 |
|
| 132 |
Agents that speak the Model Context Protocol should connect to the `integrachat` server id defined in `backend/mcp_server/server.py` and call the namespaced tools directly:
|
| 133 |
|
| 134 |
+
| Namespace | Tool | Purpose | HTTP Endpoint |
|
| 135 |
+
| --- | --- | --- | --- |
|
| 136 |
+
| `rag` | `search` | Retrieve tenant-scoped document chunks | `POST /rag/search` |
|
| 137 |
+
| `rag` | `ingest` | Chunk + store new knowledge | `POST /rag/ingest` |
|
| 138 |
+
| `rag` | `list` | List all documents for tenant | `GET /rag/list?tenant_id={id}` |
|
| 139 |
+
| `rag` | `delete` | Remove one/all stored documents | `DELETE /rag/delete/{id}?tenant_id={id}` or `DELETE /rag/delete-all?tenant_id={id}` |
|
| 140 |
+
| `web` | `search` | DuckDuckGo English-biased search | `POST /web/search` |
|
| 141 |
+
| `admin` | `getRules` | Fetch tenant governance rules (list or detailed) | `POST /admin/getRules` |
|
| 142 |
+
| `admin` | `addRule` | Insert or update a rule | `POST /admin/addRule` |
|
| 143 |
+
| `admin` | `deleteRule` | Remove a rule by text | `POST /admin/deleteRule` |
|
| 144 |
+
| `admin` | `logViolation` | Persist a red-flag event into analytics | `POST /admin/logViolation` |
|
| 145 |
+
|
| 146 |
+
**Important Notes:**
|
| 147 |
+
- Always send `tenant_id` in the payload (or as query parameter for GET/DELETE requests) so the shared middleware can enforce isolation and log analytics
|
| 148 |
+
- The MCP server automatically normalizes tenant IDs to ensure consistent matching across operations
|
| 149 |
+
- All endpoints support both POST (with JSON payload) and direct HTTP methods (GET for list, DELETE for delete operations)
|
| 150 |
+
- Tenant ID normalization handles whitespace and ensures documents can be listed and deleted consistently
|
| 151 |
|
frontend/README.md
CHANGED
|
@@ -39,14 +39,19 @@ NEXT_PUBLIC_API_URL=http://localhost:8000
|
|
| 39 |
- **Ingestion card** for quick document uploads
|
| 40 |
|
| 41 |
### Knowledge Base Page (`/knowledge-base`)
|
| 42 |
-
- **Document listing** with pagination and filtering
|
| 43 |
- **Search interface** for semantic search across documents
|
| 44 |
- **Document ingestion** with support for:
|
| 45 |
- Raw text input
|
|
|
|
| 46 |
- PDF file uploads
|
| 47 |
- DOCX file uploads
|
| 48 |
-
- TXT file uploads
|
| 49 |
-
- **Document management** with tenant isolation
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
### Components
|
| 52 |
|
|
|
|
| 39 |
- **Ingestion card** for quick document uploads
|
| 40 |
|
| 41 |
### Knowledge Base Page (`/knowledge-base`)
|
| 42 |
+
- **Document listing** with pagination and filtering by type (text, PDF, FAQ, link)
|
| 43 |
- **Search interface** for semantic search across documents
|
| 44 |
- **Document ingestion** with support for:
|
| 45 |
- Raw text input
|
| 46 |
+
- URL ingestion (automatic content fetching)
|
| 47 |
- PDF file uploads
|
| 48 |
- DOCX file uploads
|
| 49 |
+
- TXT and Markdown file uploads
|
| 50 |
+
- **Document management** with tenant isolation:
|
| 51 |
+
- Delete individual documents by ID
|
| 52 |
+
- Delete all documents for a tenant (with confirmation)
|
| 53 |
+
- Real-time document list updates after operations
|
| 54 |
+
- Error handling with clear user feedback
|
| 55 |
|
| 56 |
### Components
|
| 57 |
|