# Security Notes This is a **reference starter**, not a production system. Use these notes to harden before deploying. ## What is already safe-by-default - **Tenant isolation**: the `tenant_id` column isolates data. The search query enforces `tenant_id = X OR visibility = public`. - **Public vs restricted**: `restricted` items never leak to other tenants unless explicitly requested. - **Trust classes**: retrieval can be gated by `max_rank` so low-trust content is excluded. - **Retrieval logging**: every search is logged with filters and result IDs for audit. - **No private credentials**: the repo contains no tokens, hostnames, or internal paths. - **Synthetic data only**: all seed data is public-safe and fabricated. ## What you must add for production 1. **Authentication / authorization** - Add OAuth2, API keys, or mutual TLS. - Bind `tenant_id` to the authenticated user; never accept it from the request body. - Enable Postgres Row-Level Security (RLS) and tie policies to application-level user IDs. 2. **Input validation** - Limit `content` size (e.g. 100 KB) to prevent storage abuse. - Sanitize `metadata` JSON to reject unexpected keys. - Rate-limit writes per tenant. 3. **Network security** - Do not expose Postgres port `5432` to the internet. - Run the API and DB in a private VPC or behind a reverse proxy. - Use TLS for all client↔API and API↔Postgres connections. 4. **Secrets management** - Rotate `POSTGRES_PASSWORD` immediately; store it in a secrets manager (e.g. HashiCorp Vault, AWS Secrets Manager). - Never commit `.env` files with real passwords. 5. **Observability** - Alert on abnormal retrieval patterns (e.g. tenant A querying tenant B data). - Monitor `retrieval_logs` for signs of probing or data exfiltration. 6. **Backup and encryption** - Encrypt Postgres volumes at rest. - Schedule automated backups and test restores. ## Known limitations - No authentication layer is included (by design, to keep the starter runnable). - Placeholder embeddings are not semantically meaningful; swap in a real model before any serious use. - HNSW index parameters (`m=16`, `ef_construction=64`) are starter defaults; tune for your data size and recall requirements.