Spaces:
Sleeping
Sleeping
Commit
Β·
9155d63
1
Parent(s):
611e2c1
update the readme files
Browse files- README.md +56 -6
- backend/README.md +60 -2
README.md
CHANGED
|
@@ -220,11 +220,52 @@ All calls are proxied through the FastAPI backend running at `http://localhost:8
|
|
| 220 |
|
| 221 |
### Data Storage
|
| 222 |
|
| 223 |
-
|
| 224 |
-
- `data/admin_rules.db` - Admin rules with regex patterns and severity
|
| 225 |
-
- `data/analytics.db` - Analytics events, tool usage, violations, RAG metrics
|
| 226 |
|
| 227 |
-
- **Production
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 228 |
|
| 229 |
---
|
| 230 |
|
|
@@ -244,6 +285,15 @@ IntegraChat ships with several helper scripts to validate the full stack end-to-
|
|
| 244 |
- `python check_rag_database.py`
|
| 245 |
Provides a low-level inspection of the RAG datastore. It connects straight to the pgvector/Postgres instance, lists all tenant IDs, prints sample chunks, and runs `search_vectors()` directly to ensure the SQL `WHERE tenant_id = β¦` filter is behaving as expected. Use this script when diagnosing suspected cross-tenant leakage or when seeding demo data.
|
| 246 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 247 |
- `python test_manual.py`
|
| 248 |
The existing manual test runner remains useful for smoke-testing analytics logging, admin rule CRUD, and API response codes. Run it whenever you adjust schemas or update MCP endpoints.
|
| 249 |
|
|
@@ -286,8 +336,8 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
|
|
| 286 |
- **UI Libraries**: Plotly for interactive charts, Gradio for web interface
|
| 287 |
- **LLM Integration**: Ollama (local) or Groq (cloud) via configurable backend with streaming support
|
| 288 |
- **Vector Store**: pgvector (via Supabase) or SQLite embeddings
|
| 289 |
-
- **Analytics**: SQLite with indexed queries for fast analytics
|
| 290 |
-
- **Rules Storage**: Supabase (production) or SQLite (development) with automatic
|
| 291 |
- **MCP Server**: Unified MCP server (port 8900) exposing all tools via namespaces
|
| 292 |
- **Database**: PostgreSQL with pgvector extension for RAG embeddings, SQLite for analytics
|
| 293 |
- **File Processing**: Support for TXT, PDF, DOC, DOCX with server-side text extraction (PyPDF2, python-docx)
|
|
|
|
| 220 |
|
| 221 |
### Data Storage
|
| 222 |
|
| 223 |
+
IntegraChat supports **dual-backend storage** with automatic fallback:
|
|
|
|
|
|
|
| 224 |
|
| 225 |
+
- **Supabase (Production/Preferred)**:
|
| 226 |
+
- `admin_rules` table - Admin rules with regex patterns and severity
|
| 227 |
+
- `tool_usage_events`, `redflag_violations`, `rag_search_events`, `agent_query_events` - Analytics tables
|
| 228 |
+
- Automatically used when `SUPABASE_URL` and `SUPABASE_SERVICE_KEY` are configured
|
| 229 |
+
- Supports Row Level Security (RLS) for multi-tenant isolation
|
| 230 |
+
- Scalable, production-ready with automatic backups
|
| 231 |
+
|
| 232 |
+
- **SQLite (Development Fallback)**:
|
| 233 |
+
- `data/admin_rules.db` - Admin rules (local)
|
| 234 |
+
- `data/analytics.db` - Analytics events (local)
|
| 235 |
+
- Used automatically when Supabase credentials are not available
|
| 236 |
+
- Perfect for local development and testing
|
| 237 |
+
|
| 238 |
+
**Migration**: Use `python migrate_sqlite_to_supabase.py` to copy existing SQLite data to Supabase. See `SUPABASE_SETUP.md` for detailed setup instructions.
|
| 239 |
+
|
| 240 |
+
---
|
| 241 |
+
|
| 242 |
+
## Supabase Setup & Migration
|
| 243 |
+
|
| 244 |
+
IntegraChat supports Supabase for production-ready storage of admin rules and analytics. Both `RulesStore` and `AnalyticsStore` automatically detect and use Supabase when credentials are available, falling back to SQLite for local development.
|
| 245 |
+
|
| 246 |
+
### Quick Setup
|
| 247 |
+
|
| 248 |
+
1. **Create Supabase tables**:
|
| 249 |
+
- Run `supabase_admin_rules_table.sql` in Supabase SQL Editor
|
| 250 |
+
- Run `supabase_analytics_tables.sql` in Supabase SQL Editor
|
| 251 |
+
|
| 252 |
+
2. **Configure environment variables** in `.env`:
|
| 253 |
+
```env
|
| 254 |
+
SUPABASE_URL=https://your-project-id.supabase.co
|
| 255 |
+
SUPABASE_SERVICE_KEY=your_service_role_key_here
|
| 256 |
+
```
|
| 257 |
+
|
| 258 |
+
3. **Migrate existing data** (optional):
|
| 259 |
+
```bash
|
| 260 |
+
python migrate_sqlite_to_supabase.py
|
| 261 |
+
```
|
| 262 |
+
|
| 263 |
+
4. **Verify setup**:
|
| 264 |
+
```bash
|
| 265 |
+
python verify_supabase_setup.py
|
| 266 |
+
```
|
| 267 |
+
|
| 268 |
+
See `SUPABASE_SETUP.md` and `SUPABASE_MIGRATION_COMPLETE.md` for detailed instructions and troubleshooting.
|
| 269 |
|
| 270 |
---
|
| 271 |
|
|
|
|
| 285 |
- `python check_rag_database.py`
|
| 286 |
Provides a low-level inspection of the RAG datastore. It connects straight to the pgvector/Postgres instance, lists all tenant IDs, prints sample chunks, and runs `search_vectors()` directly to ensure the SQL `WHERE tenant_id = β¦` filter is behaving as expected. Use this script when diagnosing suspected cross-tenant leakage or when seeding demo data.
|
| 287 |
|
| 288 |
+
- `python verify_supabase_setup.py`
|
| 289 |
+
Verifies Supabase configuration and shows which backend (Supabase or SQLite) each store is using. Displays any missing configuration and provides a summary of where data will be saved.
|
| 290 |
+
|
| 291 |
+
- `python check_supabase_rules.py`
|
| 292 |
+
Checks Supabase admin rules configuration and RLS policies. Validates that rules can be read/written correctly.
|
| 293 |
+
|
| 294 |
+
- `python migrate_sqlite_to_supabase.py`
|
| 295 |
+
One-shot migration script that copies existing SQLite data (admin rules + analytics) to Supabase. Supports both PostgreSQL direct connection and Supabase REST API methods.
|
| 296 |
+
|
| 297 |
- `python test_manual.py`
|
| 298 |
The existing manual test runner remains useful for smoke-testing analytics logging, admin rule CRUD, and API response codes. Run it whenever you adjust schemas or update MCP endpoints.
|
| 299 |
|
|
|
|
| 336 |
- **UI Libraries**: Plotly for interactive charts, Gradio for web interface
|
| 337 |
- **LLM Integration**: Ollama (local) or Groq (cloud) via configurable backend with streaming support
|
| 338 |
- **Vector Store**: pgvector (via Supabase) or SQLite embeddings
|
| 339 |
+
- **Analytics**: Supabase (production) or SQLite (development) with indexed queries for fast analytics
|
| 340 |
+
- **Rules Storage**: Supabase (production) or SQLite (development) with automatic detection and fallback
|
| 341 |
- **MCP Server**: Unified MCP server (port 8900) exposing all tools via namespaces
|
| 342 |
- **Database**: PostgreSQL with pgvector extension for RAG embeddings, SQLite for analytics
|
| 343 |
- **File Processing**: Support for TXT, PDF, DOC, DOCX with server-side text extraction (PyPDF2, python-docx)
|
backend/README.md
CHANGED
|
@@ -12,7 +12,10 @@ This folder contains the production-ready FastAPI stack plus the companion MCP s
|
|
| 12 |
|
| 13 |
- Python 3.10+
|
| 14 |
- PostgreSQL (with the `vector` extension) for RAG data, or Supabase with pgvector enabled
|
| 15 |
-
- Supabase (
|
|
|
|
|
|
|
|
|
|
| 16 |
- Optional: Ollama running locally (default) or Groq API credentials for remote LLMs
|
| 17 |
|
| 18 |
Create a virtual environment at the repo root, then:
|
|
@@ -88,6 +91,9 @@ Use the helper scripts in the repo root when validating backend changes:
|
|
| 88 |
|
| 89 |
- `python verify_tenant_isolation.py` β Exercises analytics logging, admin rule CRUD, API reachability, and proves RAG tenant isolation by ingesting + querying as multiple tenants.
|
| 90 |
- `python check_rag_database.py` β Talks directly to the pgvector database to list tenant IDs, preview stored chunks, and run safeguarded searches via `search_vectors()`. Helpful when troubleshooting suspected cross-tenant leakage.
|
|
|
|
|
|
|
|
|
|
| 91 |
- `python test_manual.py` β Legacy manual smoke test harness (analytics store, admin rules, API surface).
|
| 92 |
|
| 93 |
> **Troubleshooting tip:** If the isolation script reports a failure, first run `check_rag_database.py` to confirm documents are tagged with the correct `tenant_id`, then restart the unified MCP server so it reloads the updated SQL filtering logic.
|
|
@@ -149,13 +155,58 @@ Defined in `env.example`:
|
|
| 149 |
- `MCP_HOST` - Host for unified MCP server (default: 0.0.0.0)
|
| 150 |
- `POSTGRESQL_URL` - PostgreSQL connection string with pgvector extension
|
| 151 |
- `OLLAMA_URL`, `OLLAMA_MODEL` (or `GROQ_API_KEY` + `LLM_BACKEND=groq`)
|
| 152 |
-
- `SUPABASE_URL`, `SUPABASE_SERVICE_KEY` (
|
|
|
|
|
|
|
| 153 |
- `APP_ENV`, `LOG_LEVEL`, `API_PORT`
|
| 154 |
|
| 155 |
Update these before starting the servers to ensure the agent can reach every MCP endpoint and LLM runtime.
|
| 156 |
|
| 157 |
**Note**: The unified MCP server runs on a single port (default 8900) and handles all namespaced tools. The `start.bat` script automatically configures the correct URLs.
|
| 158 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
## Unified MCP tool instructions
|
| 160 |
|
| 161 |
Agents that speak the Model Context Protocol should connect to the `integrachat` server id defined in `backend/mcp_server/server.py` and call the namespaced tools directly:
|
|
@@ -198,3 +249,10 @@ Agents that speak the Model Context Protocol should connect to the `integrachat`
|
|
| 198 |
- **Tenant ID mismatch**: The system normalizes tenant IDs, but ensure you're using the same tenant_id format as when documents were ingested
|
| 199 |
- **Check logs**: Database deletion logs show detailed information about tenant ID matching and document existence
|
| 200 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
- Python 3.10+
|
| 14 |
- PostgreSQL (with the `vector` extension) for RAG data, or Supabase with pgvector enabled
|
| 15 |
+
- **Supabase (recommended)** for admin rules + analytics storage, with automatic SQLite fallback in `data/`
|
| 16 |
+
- Both `RulesStore` and `AnalyticsStore` automatically detect and use Supabase when configured
|
| 17 |
+
- Falls back to SQLite automatically if Supabase credentials are missing
|
| 18 |
+
- See `SUPABASE_SETUP.md` in the root directory for setup instructions
|
| 19 |
- Optional: Ollama running locally (default) or Groq API credentials for remote LLMs
|
| 20 |
|
| 21 |
Create a virtual environment at the repo root, then:
|
|
|
|
| 91 |
|
| 92 |
- `python verify_tenant_isolation.py` β Exercises analytics logging, admin rule CRUD, API reachability, and proves RAG tenant isolation by ingesting + querying as multiple tenants.
|
| 93 |
- `python check_rag_database.py` β Talks directly to the pgvector database to list tenant IDs, preview stored chunks, and run safeguarded searches via `search_vectors()`. Helpful when troubleshooting suspected cross-tenant leakage.
|
| 94 |
+
- `python verify_supabase_setup.py` β Verifies Supabase configuration and shows which backend (Supabase or SQLite) each store is using.
|
| 95 |
+
- `python check_supabase_rules.py` β Checks Supabase admin rules configuration and RLS policies.
|
| 96 |
+
- `python migrate_sqlite_to_supabase.py` β One-shot migration script to copy existing SQLite data to Supabase.
|
| 97 |
- `python test_manual.py` β Legacy manual smoke test harness (analytics store, admin rules, API surface).
|
| 98 |
|
| 99 |
> **Troubleshooting tip:** If the isolation script reports a failure, first run `check_rag_database.py` to confirm documents are tagged with the correct `tenant_id`, then restart the unified MCP server so it reloads the updated SQL filtering logic.
|
|
|
|
| 155 |
- `MCP_HOST` - Host for unified MCP server (default: 0.0.0.0)
|
| 156 |
- `POSTGRESQL_URL` - PostgreSQL connection string with pgvector extension
|
| 157 |
- `OLLAMA_URL`, `OLLAMA_MODEL` (or `GROQ_API_KEY` + `LLM_BACKEND=groq`)
|
| 158 |
+
- `SUPABASE_URL`, `SUPABASE_SERVICE_KEY` - **Required for Supabase backend** (admin rules + analytics)
|
| 159 |
+
- If not set, the system automatically falls back to SQLite in `data/` directory
|
| 160 |
+
- See `SUPABASE_SETUP.md` in the root directory for detailed setup instructions
|
| 161 |
- `APP_ENV`, `LOG_LEVEL`, `API_PORT`
|
| 162 |
|
| 163 |
Update these before starting the servers to ensure the agent can reach every MCP endpoint and LLM runtime.
|
| 164 |
|
| 165 |
**Note**: The unified MCP server runs on a single port (default 8900) and handles all namespaced tools. The `start.bat` script automatically configures the correct URLs.
|
| 166 |
|
| 167 |
+
## Supabase Configuration
|
| 168 |
+
|
| 169 |
+
Both `RulesStore` and `AnalyticsStore` support dual-backend storage with automatic detection:
|
| 170 |
+
|
| 171 |
+
### Setup Steps
|
| 172 |
+
|
| 173 |
+
1. **Create Supabase tables**:
|
| 174 |
+
- Run `supabase_admin_rules_table.sql` in Supabase SQL Editor (from repo root)
|
| 175 |
+
- Run `supabase_analytics_tables.sql` in Supabase SQL Editor (from repo root)
|
| 176 |
+
|
| 177 |
+
2. **Configure environment variables** in `.env`:
|
| 178 |
+
```env
|
| 179 |
+
SUPABASE_URL=https://your-project-id.supabase.co
|
| 180 |
+
SUPABASE_SERVICE_KEY=your_service_role_key_here
|
| 181 |
+
```
|
| 182 |
+
|
| 183 |
+
3. **Verify configuration**:
|
| 184 |
+
```bash
|
| 185 |
+
python verify_supabase_setup.py
|
| 186 |
+
```
|
| 187 |
+
|
| 188 |
+
4. **Migrate existing data** (if you have SQLite data):
|
| 189 |
+
```bash
|
| 190 |
+
python migrate_sqlite_to_supabase.py
|
| 191 |
+
```
|
| 192 |
+
|
| 193 |
+
### How It Works
|
| 194 |
+
|
| 195 |
+
- **Automatic Detection**: Both stores check for `SUPABASE_URL` and `SUPABASE_SERVICE_KEY` at initialization
|
| 196 |
+
- **Supabase First**: If credentials are found, Supabase is used automatically
|
| 197 |
+
- **SQLite Fallback**: If Supabase is not configured, SQLite databases in `data/` are used
|
| 198 |
+
- **Startup Logging**: Check startup logs to see which backend each store is using:
|
| 199 |
+
- `β
RulesStore: Using Supabase backend`
|
| 200 |
+
- `β
AnalyticsStore: Using Supabase backend`
|
| 201 |
+
- Or `β οΈ RulesStore: Using SQLite backend` if Supabase is not configured
|
| 202 |
+
|
| 203 |
+
### Tables Used
|
| 204 |
+
|
| 205 |
+
- **Admin Rules**: `admin_rules` table in Supabase
|
| 206 |
+
- **Analytics**: `tool_usage_events`, `redflag_violations`, `rag_search_events`, `agent_query_events`
|
| 207 |
+
|
| 208 |
+
See `SUPABASE_SETUP.md` and `SUPABASE_MIGRATION_COMPLETE.md` in the root directory for detailed instructions and troubleshooting.
|
| 209 |
+
|
| 210 |
## Unified MCP tool instructions
|
| 211 |
|
| 212 |
Agents that speak the Model Context Protocol should connect to the `integrachat` server id defined in `backend/mcp_server/server.py` and call the namespaced tools directly:
|
|
|
|
| 249 |
- **Tenant ID mismatch**: The system normalizes tenant IDs, but ensure you're using the same tenant_id format as when documents were ingested
|
| 250 |
- **Check logs**: Database deletion logs show detailed information about tenant ID matching and document existence
|
| 251 |
|
| 252 |
+
### Supabase Configuration Issues
|
| 253 |
+
- **Data still going to SQLite**: Check that `SUPABASE_URL` and `SUPABASE_SERVICE_KEY` are set correctly in `.env` (no quotes, no spaces)
|
| 254 |
+
- **Service role key errors**: Make sure you're using the **service_role** key (not anon key) from Supabase Dashboard β Settings β API
|
| 255 |
+
- **Tables don't exist**: Run `supabase_admin_rules_table.sql` and `supabase_analytics_tables.sql` in Supabase SQL Editor
|
| 256 |
+
- **Permission errors**: Check RLS policies in Supabase allow service role access
|
| 257 |
+
- **Startup warnings**: Check FastAPI startup logs to see which backend each store is using (`β
` for Supabase, `β οΈ` for SQLite fallback)
|
| 258 |
+
|