File size: 2,257 Bytes
d82356c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# Security Notes

This is a **reference starter**, not a production system. Use these notes to harden before deploying.

## What is already safe-by-default

- **Tenant isolation**: the `tenant_id` column isolates data. The search query enforces `tenant_id = X OR visibility = public`.
- **Public vs restricted**: `restricted` items never leak to other tenants unless explicitly requested.
- **Trust classes**: retrieval can be gated by `max_rank` so low-trust content is excluded.
- **Retrieval logging**: every search is logged with filters and result IDs for audit.
- **No private credentials**: the repo contains no tokens, hostnames, or internal paths.
- **Synthetic data only**: all seed data is public-safe and fabricated.

## What you must add for production

1. **Authentication / authorization**
   - Add OAuth2, API keys, or mutual TLS.
   - Bind `tenant_id` to the authenticated user; never accept it from the request body.
   - Enable Postgres Row-Level Security (RLS) and tie policies to application-level user IDs.

2. **Input validation**
   - Limit `content` size (e.g. 100 KB) to prevent storage abuse.
   - Sanitize `metadata` JSON to reject unexpected keys.
   - Rate-limit writes per tenant.

3. **Network security**
   - Do not expose Postgres port `5432` to the internet.
   - Run the API and DB in a private VPC or behind a reverse proxy.
   - Use TLS for all client↔API and API↔Postgres connections.

4. **Secrets management**
   - Rotate `POSTGRES_PASSWORD` immediately; store it in a secrets manager (e.g. HashiCorp Vault, AWS Secrets Manager).
   - Never commit `.env` files with real passwords.

5. **Observability**
   - Alert on abnormal retrieval patterns (e.g. tenant A querying tenant B data).
   - Monitor `retrieval_logs` for signs of probing or data exfiltration.

6. **Backup and encryption**
   - Encrypt Postgres volumes at rest.
   - Schedule automated backups and test restores.

## Known limitations

- No authentication layer is included (by design, to keep the starter runnable).
- Placeholder embeddings are not semantically meaningful; swap in a real model before any serious use.
- HNSW index parameters (`m=16`, `ef_construction=64`) are starter defaults; tune for your data size and recall requirements.