nothingworry commited on
Commit
4c04529
·
1 Parent(s): 78b6d7b

update the Readme.md files

Browse files
Files changed (3) hide show
  1. README.md +23 -6
  2. backend/README.md +66 -17
  3. frontend/README.md +8 -3
README.md CHANGED
@@ -48,8 +48,10 @@ This platform showcases how MCP can power intelligent, governed, multi-tenant AI
48
 
49
  1. **Backend services running**:
50
  - FastAPI API (`uvicorn backend.api.main:app --port 8000`)
51
- - MCP servers (RAG 8001, Web 8002, Admin 8003) as described in `backend/README.md`
52
  - Optional: Ollama / Groq credentials for the LLM client
 
 
53
  2. **Python 3.10+** with the dependencies in `requirements.txt`
54
 
55
  ### Installation
@@ -122,11 +124,11 @@ Then open `http://localhost:3000`. The navbar links on the landing page route to
122
 
123
  | Purpose | Method & Path | Description |
124
  | --- | --- | --- |
125
- | Ingest document | `POST /rag/ingest-document` | Accepts `source_type`, `content`, metadata (filename, URL, doc_id) |
126
  | Ingest file | `POST /rag/ingest-file` | Multipart upload with `x-tenant-id` header (PDF/DOCX/TXT/MD) |
127
- | List documents | `GET /rag/list` | Returns all documents for a tenant with pagination |
128
- | Delete document | `DELETE /rag/delete/{document_id}` | Deletes a specific document by ID |
129
- | Delete all documents | `DELETE /rag/delete-all` | Deletes all documents for a tenant |
130
 
131
  ### Admin & Governance Endpoints
132
 
@@ -235,7 +237,22 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
235
  - **LLM Integration**: Ollama (local) or Groq (cloud) via configurable backend
236
  - **Vector Store**: pgvector (via Supabase) or SQLite embeddings
237
  - **Analytics**: SQLite with indexed queries for fast analytics
238
- - **MCP Servers**: RAG (8001), Web (8002), Admin (8003)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
239
 
240
  ## Acknowledgments
241
 
 
48
 
49
  1. **Backend services running**:
50
  - FastAPI API (`uvicorn backend.api.main:app --port 8000`)
51
+ - Unified MCP server (port 8900) as described in `backend/README.md`
52
  - Optional: Ollama / Groq credentials for the LLM client
53
+
54
+ **Quick Start**: Run `start.bat` (Windows) to launch all services automatically.
55
  2. **Python 3.10+** with the dependencies in `requirements.txt`
56
 
57
  ### Installation
 
124
 
125
  | Purpose | Method & Path | Description |
126
  | --- | --- | --- |
127
+ | Ingest document | `POST /rag/ingest-document` | Accepts `source_type`, `content`, metadata (filename, URL, doc_id). Supports raw text, URLs, PDFs, DOCX, TXT, and Markdown files |
128
  | Ingest file | `POST /rag/ingest-file` | Multipart upload with `x-tenant-id` header (PDF/DOCX/TXT/MD) |
129
+ | List documents | `GET /rag/list?tenant_id={id}&limit={n}&offset={n}` | Returns all documents for a tenant with pagination. Requires `x-tenant-id` header or `tenant_id` query parameter |
130
+ | Delete document | `DELETE /rag/delete/{document_id}?tenant_id={id}` | Deletes a specific document by ID. Requires `x-tenant-id` header or `tenant_id` query parameter |
131
+ | Delete all documents | `DELETE /rag/delete-all?tenant_id={id}` | Deletes all documents for a tenant. Requires `x-tenant-id` header or `tenant_id` query parameter |
132
 
133
  ### Admin & Governance Endpoints
134
 
 
237
  - **LLM Integration**: Ollama (local) or Groq (cloud) via configurable backend
238
  - **Vector Store**: pgvector (via Supabase) or SQLite embeddings
239
  - **Analytics**: SQLite with indexed queries for fast analytics
240
+ - **MCP Server**: Unified MCP server (port 8900) exposing all tools via namespaces
241
+ - **Database**: PostgreSQL with pgvector extension for RAG embeddings, SQLite for analytics
242
+
243
+ ## Key Technical Features
244
+
245
+ ### Tenant Isolation & Normalization
246
+ - **Strict tenant isolation** enforced at database level with `WHERE tenant_id = ...` filters
247
+ - **Automatic tenant ID normalization** handles whitespace and formatting differences
248
+ - Documents can be listed and deleted consistently across different tenant_id formats
249
+ - All operations validate tenant ownership before execution
250
+
251
+ ### MCP Server Architecture
252
+ - **Unified server** running on a single port (default 8900) for all namespaced tools
253
+ - **Dual protocol support**: Both MCP protocol (POST with JSON) and RESTful HTTP (GET/DELETE)
254
+ - **Response wrapping**: Standardized response format with automatic unwrapping in clients
255
+ - **Error handling**: Comprehensive error responses with detailed messages for debugging
256
 
257
  ## Acknowledgments
258
 
backend/README.md CHANGED
@@ -33,10 +33,27 @@ cp env.example .env # update MCP URLs + LLM settings
33
  ```bash
34
  python backend/mcp_server/server.py
35
  ```
36
- This single endpoint exposes the following namespaced tools:
37
- - `rag.search`, `rag.ingest`, `rag.delete`
38
- - `web.search`
 
 
 
 
 
 
 
 
39
  - `admin.getRules`, `admin.addRule`, `admin.deleteRule`, `admin.logViolation`
 
 
 
 
 
 
 
 
 
40
 
41
  3. **Optional workers** (if running Celery-based ingestion/analytics jobs):
42
  ```bash
@@ -56,7 +73,10 @@ All endpoints require the `x-tenant-id` header unless otherwise noted.
56
  | Agent Debug | `POST /agent/debug` | Full reasoning trace + tool plan |
57
  | Agent Plan | `POST /agent/plan` | Dry-run planning without executing tools |
58
  | RAG | `POST /rag/ingest-document` | Rich ingestion (text, URL, metadata) |
59
- | RAG | `GET /rag/list` | Paginated document listing per tenant |
 
 
 
60
  | Admin | `POST /admin/rules` | Regex + severity rule ingestion |
61
  | Analytics | `GET /analytics/overview` | Summary metrics (queries, tokens, red flags) |
62
 
@@ -72,31 +92,60 @@ Use the helper scripts in the repo root when validating backend changes:
72
 
73
  > **Troubleshooting tip:** If the isolation script reports a failure, first run `check_rag_database.py` to confirm documents are tagged with the correct `tenant_id`, then restart the unified MCP server so it reloads the updated SQL filtering logic.
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ## Environment Variables (excerpt)
76
 
77
  Defined in `env.example`:
78
 
79
- - `RAG_MCP_URL`, `WEB_MCP_URL`, `ADMIN_MCP_URL`
 
 
 
 
 
80
  - `OLLAMA_URL`, `OLLAMA_MODEL` (or `GROQ_API_KEY` + `LLM_BACKEND=groq`)
81
  - `SUPABASE_URL`, `SUPABASE_SERVICE_KEY` (optional admin integrations)
82
  - `APP_ENV`, `LOG_LEVEL`, `API_PORT`
83
 
84
  Update these before starting the servers to ensure the agent can reach every MCP endpoint and LLM runtime.
85
 
 
 
86
  ## Unified MCP tool instructions
87
 
88
  Agents that speak the Model Context Protocol should connect to the `integrachat` server id defined in `backend/mcp_server/server.py` and call the namespaced tools directly:
89
 
90
- | Namespace | Tool | Purpose |
91
- | --- | --- | --- |
92
- | `rag` | `search` | Retrieve tenant-scoped document chunks |
93
- | `rag` | `ingest` | Chunk + store new knowledge |
94
- | `rag` | `delete` | Remove one/all stored documents |
95
- | `web` | `search` | DuckDuckGo English-biased search |
96
- | `admin` | `getRules` | Fetch tenant governance rules (list or detailed) |
97
- | `admin` | `addRule` | Insert or update a rule |
98
- | `admin` | `deleteRule` | Remove a rule by text |
99
- | `admin` | `logViolation` | Persist a red-flag event into analytics |
100
-
101
- Always send `tenant_id`, and optionally `user_id`, in the payload so the shared middleware can enforce isolation and log analytics.
 
 
 
 
 
102
 
 
33
  ```bash
34
  python backend/mcp_server/server.py
35
  ```
36
+ Or use the provided startup script:
37
+ ```bash
38
+ start.bat # Windows - launches MCP server on port 8900 and FastAPI on port 8000
39
+ ```
40
+
41
+ This single server (default port 8900) exposes the following namespaced tools:
42
+ - `rag.search` - Semantic search across tenant documents
43
+ - `rag.ingest` - Ingest text content into knowledge base
44
+ - `rag.delete` - Delete individual or all documents for a tenant
45
+ - `rag.list` - List all documents for a tenant with pagination
46
+ - `web.search` - DuckDuckGo-based web search
47
  - `admin.getRules`, `admin.addRule`, `admin.deleteRule`, `admin.logViolation`
48
+
49
+ **HTTP Endpoints** (for direct API access):
50
+ - `GET /rag/list?tenant_id={id}&limit={n}&offset={n}` - List documents
51
+ - `POST /rag/ingest` - Ingest content
52
+ - `POST /rag/search` - Search documents
53
+ - `DELETE /rag/delete/{document_id}?tenant_id={id}` - Delete specific document
54
+ - `DELETE /rag/delete-all?tenant_id={id}` - Delete all documents
55
+ - `POST /web/search` - Web search
56
+ - `POST /admin/*` - Admin operations
57
 
58
  3. **Optional workers** (if running Celery-based ingestion/analytics jobs):
59
  ```bash
 
73
  | Agent Debug | `POST /agent/debug` | Full reasoning trace + tool plan |
74
  | Agent Plan | `POST /agent/plan` | Dry-run planning without executing tools |
75
  | RAG | `POST /rag/ingest-document` | Rich ingestion (text, URL, metadata) |
76
+ | RAG | `POST /rag/ingest-file` | File upload (PDF/DOCX/TXT/MD) |
77
+ | RAG | `GET /rag/list` | Paginated document listing per tenant (requires `x-tenant-id` header) |
78
+ | RAG | `DELETE /rag/delete/{document_id}` | Delete specific document (requires `x-tenant-id` header) |
79
+ | RAG | `DELETE /rag/delete-all` | Delete all documents for tenant (requires `x-tenant-id` header) |
80
  | Admin | `POST /admin/rules` | Regex + severity rule ingestion |
81
  | Analytics | `GET /analytics/overview` | Summary metrics (queries, tokens, red flags) |
82
 
 
92
 
93
  > **Troubleshooting tip:** If the isolation script reports a failure, first run `check_rag_database.py` to confirm documents are tagged with the correct `tenant_id`, then restart the unified MCP server so it reloads the updated SQL filtering logic.
94
 
95
+ ## Recent Improvements
96
+
97
+ ### Tenant ID Normalization
98
+ - All database operations now normalize tenant IDs to handle whitespace and formatting differences
99
+ - Documents can be listed and deleted consistently even if stored with slightly different tenant_id formatting
100
+ - The system automatically matches tenant IDs after normalization, ensuring operations work across different input formats
101
+
102
+ ### HTTP Endpoint Support
103
+ - Added GET support for `/rag/list` endpoint (previously POST-only)
104
+ - Added DELETE support for `/rag/delete/{document_id}` and `/rag/delete-all` endpoints
105
+ - All endpoints support both MCP protocol (POST with JSON payload) and direct HTTP methods (GET/DELETE with query parameters)
106
+
107
+ ### Response Format
108
+ - MCP server responses are wrapped in a standard format with `status`, `data`, and `metadata` fields
109
+ - RAG client automatically unwraps responses for seamless integration
110
+ - Error responses include detailed messages for better debugging
111
+
112
  ## Environment Variables (excerpt)
113
 
114
  Defined in `env.example`:
115
 
116
+ - `RAG_MCP_URL` - Default: `http://localhost:8900/rag` (unified MCP server)
117
+ - `WEB_MCP_URL` - Default: `http://localhost:8900/web` (unified MCP server)
118
+ - `ADMIN_MCP_URL` - Default: `http://localhost:8900/admin` (unified MCP server)
119
+ - `MCP_PORT` - Port for unified MCP server (default: 8900)
120
+ - `MCP_HOST` - Host for unified MCP server (default: 0.0.0.0)
121
+ - `POSTGRESQL_URL` - PostgreSQL connection string with pgvector extension
122
  - `OLLAMA_URL`, `OLLAMA_MODEL` (or `GROQ_API_KEY` + `LLM_BACKEND=groq`)
123
  - `SUPABASE_URL`, `SUPABASE_SERVICE_KEY` (optional admin integrations)
124
  - `APP_ENV`, `LOG_LEVEL`, `API_PORT`
125
 
126
  Update these before starting the servers to ensure the agent can reach every MCP endpoint and LLM runtime.
127
 
128
+ **Note**: The unified MCP server runs on a single port (default 8900) and handles all namespaced tools. The `start.bat` script automatically configures the correct URLs.
129
+
130
  ## Unified MCP tool instructions
131
 
132
  Agents that speak the Model Context Protocol should connect to the `integrachat` server id defined in `backend/mcp_server/server.py` and call the namespaced tools directly:
133
 
134
+ | Namespace | Tool | Purpose | HTTP Endpoint |
135
+ | --- | --- | --- | --- |
136
+ | `rag` | `search` | Retrieve tenant-scoped document chunks | `POST /rag/search` |
137
+ | `rag` | `ingest` | Chunk + store new knowledge | `POST /rag/ingest` |
138
+ | `rag` | `list` | List all documents for tenant | `GET /rag/list?tenant_id={id}` |
139
+ | `rag` | `delete` | Remove one/all stored documents | `DELETE /rag/delete/{id}?tenant_id={id}` or `DELETE /rag/delete-all?tenant_id={id}` |
140
+ | `web` | `search` | DuckDuckGo English-biased search | `POST /web/search` |
141
+ | `admin` | `getRules` | Fetch tenant governance rules (list or detailed) | `POST /admin/getRules` |
142
+ | `admin` | `addRule` | Insert or update a rule | `POST /admin/addRule` |
143
+ | `admin` | `deleteRule` | Remove a rule by text | `POST /admin/deleteRule` |
144
+ | `admin` | `logViolation` | Persist a red-flag event into analytics | `POST /admin/logViolation` |
145
+
146
+ **Important Notes:**
147
+ - Always send `tenant_id` in the payload (or as query parameter for GET/DELETE requests) so the shared middleware can enforce isolation and log analytics
148
+ - The MCP server automatically normalizes tenant IDs to ensure consistent matching across operations
149
+ - All endpoints support both POST (with JSON payload) and direct HTTP methods (GET for list, DELETE for delete operations)
150
+ - Tenant ID normalization handles whitespace and ensures documents can be listed and deleted consistently
151
 
frontend/README.md CHANGED
@@ -39,14 +39,19 @@ NEXT_PUBLIC_API_URL=http://localhost:8000
39
  - **Ingestion card** for quick document uploads
40
 
41
  ### Knowledge Base Page (`/knowledge-base`)
42
- - **Document listing** with pagination and filtering
43
  - **Search interface** for semantic search across documents
44
  - **Document ingestion** with support for:
45
  - Raw text input
 
46
  - PDF file uploads
47
  - DOCX file uploads
48
- - TXT file uploads
49
- - **Document management** with tenant isolation
 
 
 
 
50
 
51
  ### Components
52
 
 
39
  - **Ingestion card** for quick document uploads
40
 
41
  ### Knowledge Base Page (`/knowledge-base`)
42
+ - **Document listing** with pagination and filtering by type (text, PDF, FAQ, link)
43
  - **Search interface** for semantic search across documents
44
  - **Document ingestion** with support for:
45
  - Raw text input
46
+ - URL ingestion (automatic content fetching)
47
  - PDF file uploads
48
  - DOCX file uploads
49
+ - TXT and Markdown file uploads
50
+ - **Document management** with tenant isolation:
51
+ - Delete individual documents by ID
52
+ - Delete all documents for a tenant (with confirmation)
53
+ - Real-time document list updates after operations
54
+ - Error handling with clear user feedback
55
 
56
  ### Components
57