| # Security Notes |
|
|
| This is a **reference starter**, not a production system. Use these notes to harden before deploying. |
|
|
| ## What is already safe-by-default |
|
|
| - **Tenant isolation**: the `tenant_id` column isolates data. The search query enforces `tenant_id = X OR visibility = public`. |
| - **Public vs restricted**: `restricted` items never leak to other tenants unless explicitly requested. |
| - **Trust classes**: retrieval can be gated by `max_rank` so low-trust content is excluded. |
| - **Retrieval logging**: every search is logged with filters and result IDs for audit. |
| - **No private credentials**: the repo contains no tokens, hostnames, or internal paths. |
| - **Synthetic data only**: all seed data is public-safe and fabricated. |
|
|
| ## What you must add for production |
|
|
| 1. **Authentication / authorization** |
| - Add OAuth2, API keys, or mutual TLS. |
| - Bind `tenant_id` to the authenticated user; never accept it from the request body. |
| - Enable Postgres Row-Level Security (RLS) and tie policies to application-level user IDs. |
|
|
| 2. **Input validation** |
| - Limit `content` size (e.g. 100 KB) to prevent storage abuse. |
| - Sanitize `metadata` JSON to reject unexpected keys. |
| - Rate-limit writes per tenant. |
|
|
| 3. **Network security** |
| - Do not expose Postgres port `5432` to the internet. |
| - Run the API and DB in a private VPC or behind a reverse proxy. |
| - Use TLS for all client↔API and API↔Postgres connections. |
|
|
| 4. **Secrets management** |
| - Rotate `POSTGRES_PASSWORD` immediately; store it in a secrets manager (e.g. HashiCorp Vault, AWS Secrets Manager). |
| - Never commit `.env` files with real passwords. |
|
|
| 5. **Observability** |
| - Alert on abnormal retrieval patterns (e.g. tenant A querying tenant B data). |
| - Monitor `retrieval_logs` for signs of probing or data exfiltration. |
|
|
| 6. **Backup and encryption** |
| - Encrypt Postgres volumes at rest. |
| - Schedule automated backups and test restores. |
|
|
| ## Known limitations |
|
|
| - No authentication layer is included (by design, to keep the starter runnable). |
| - Placeholder embeddings are not semantically meaningful; swap in a real model before any serious use. |
| - HNSW index parameters (`m=16`, `ef_construction=64`) are starter defaults; tune for your data size and recall requirements. |
|
|