# Deploying rag-psych to Fly.io Architecture: **Fly.io for the API container, Neon for Postgres + pgvector.** Both have free tiers that cover the demo. The API auto-stops when idle so the bill stays at $0 between visits. > **Critical:** never run `--sources dsm5` against the remote DB. The DSM > chunks are licensed for local personal use only and must stay in your > laptop's `pgdata` volume. The `scripts/seed_remote.sh` helper rejects > any attempt to ingest DSM remotely. --- ## 1. Provision Postgres on Neon (free) 1. Sign up at [console.neon.tech](https://console.neon.tech) (GitHub auth, free). 2. Create a new project. Region: pick one near your Fly region (we'll use `ord` below; Neon's `aws-us-east-2` is closest). 3. In the project dashboard, go to **Settings → Extensions** and enable `vector`. (One toggle. Neon ships pgvector pre-installed; you just activate it.) 4. Copy the **Connection string** from the dashboard. It looks like: ``` postgresql://USER:PASSWORD@ep-xyz-12345.us-east-2.aws.neon.tech/rag_psych?sslmode=require ``` 5. Apply our schema. From your laptop: ```bash psql 'postgresql://USER:PASSWORD@ep-xyz-12345.us-east-2.aws.neon.tech/rag_psych?sslmode=require' \ -f ingest/schema.sql ``` --- ## 2. Install Fly's CLI and authenticate ```bash brew install flyctl # macOS; see fly.io/docs/hands-on/install-flyctl/ for others fly auth signup # or: fly auth login (browser) ``` Free trial credit covers the demo for the first month. After that the runtime cost is dominated by uptime, which scale-to-zero pushes near zero. --- ## 3. Launch the Fly app (no deploy yet) From the repo root: ```bash fly launch --no-deploy --copy-config --name rag-psych- ``` When prompted: - **Region**: pick the same one your Neon DB is closest to (e.g. `ord`) - **Postgres**: **No** (we're using Neon, not Fly Postgres) - **Redis**: No - **Settings**: keep the existing `fly.toml` (the `--copy-config` flag preserves it) Open the generated `fly.toml` and update the `app = "..."` line to match the unique name Fly assigned (the launcher overwrites it). --- ## 4. Set secrets These never appear in `fly.toml` or the image. They live encrypted in Fly's secret store and are injected as env vars at runtime. ```bash fly secrets set \ DATABASE_URL='postgresql://USER:PASSWORD@ep-xyz.us-east-2.aws.neon.tech/rag_psych?sslmode=require' \ ANTHROPIC_API_KEY='sk-ant-api03-...' \ ANTHROPIC_MODEL='claude-haiku-4-5' \ NCBI_EMAIL='you@example.com' \ ICD_CLIENT_ID='...' \ ICD_CLIENT_SECRET='...' \ EVAL_PASSWORD="$(python3 -c 'import secrets; print(secrets.token_urlsafe(16))')" \ CORS_ORIGIN='https://rag-psych-.fly.dev' ``` > ⚠️ **Rotate your local `.env` keys before deploying.** The > `ANTHROPIC_API_KEY` and `EVAL_PASSWORD` currently in `.env` should be > regenerated for production — assume the old ones are compromised > (anything that touches a chat with an LLM is). Console: > [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys). --- ## 5. Deploy ```bash fly deploy ``` First push uploads ~3.5 GB (models baked into the image — see `api/Dockerfile`). Subsequent deploys only push the layers that changed, so code edits redeploy in seconds. When it finishes: ```bash fly status # check machine health fly logs # follow startup logs open https://rag-psych-.fly.dev/health ``` A `{"status":"ok"}` response means the API is up. The first `/query` will pay a 5–10 s cold-start while the embedder + reranker load. --- ## 6. Seed the remote database The Neon DB is empty after step 1. Run ingest from your laptop against it: ```bash DATABASE_URL='postgresql://USER:PASSWORD@ep-xyz.us-east-2.aws.neon.tech/rag_psych?sslmode=require' \ ./scripts/seed_remote.sh ``` This takes 10–15 minutes wall time, mostly the PubMed `efetch` loop on the first run. The cached JSON files in `data/cache/` mean re-runs are near-instant. After it finishes, hit the deployed UI: ```bash open https://rag-psych-.fly.dev/ui ``` --- ## 7. Production hardening (recommended before sharing publicly) | Item | How | |---|---| | Anthropic spend cap | Drop to **$5/week** at console.anthropic.com → Settings → Limits before sharing the URL | | `/eval` password | Confirm `EVAL_PASSWORD` is random + long. Never the local-dev value. | | Rate limit | Already 30/min/IP. If you see abuse, drop to 10/min in `api/main.py`'s `@limiter.limit` decorator and `fly deploy` | | CORS | Only the Fly subdomain in production. If you remove the localhost dev origin from the Fly secret, dev still works locally because your `.env` has its own value | | Healthcheck grace | `fly.toml` already gives 30 s for cold-start. If you see flapping, bump to 60 s | | Auto-stop | Already on (`auto_stop_machines = "stop"`). Verify with `fly status` — idle machines should report `stopped` | --- ## 8. Day-2 ops cheatsheet ```bash fly status # which machines are running fly logs # tail combined logs fly logs --instance # one machine fly ssh console # shell into a running instance fly secrets list # what's set (values not shown) fly secrets unset SOME_KEY # remove fly scale memory 4096 # bump RAM if rerank latency is bad fly scale count 2 # add a second machine for redundancy fly apps destroy rag-psych-<...> # nuke everything (careful) ``` --- ## 9. Things you'll feel in production that you don't on localhost - **First request is slow** (5–10 s) when the machine just woke from auto-stop. Subsequent requests in the same minute are fast. - **Latency floor is higher** because every request crosses the public internet (Fly ↔ Neon) instead of localhost. Expect ~2× the times shown in `eval/results/*.json`. - **Anthropic costs scale with usage.** A naive demo URL hit by 100 curious visitors at 5 queries each = ~$2 of Haiku tokens. Spend cap protects you. - **No DSM in answers.** If a query that worked locally suddenly returns the refusal string in production, you're hitting the DSM-shaped hole in the public corpus — that's expected. --- ## Why not Fly Postgres? Fly's managed Postgres is fine but you'd have to install pgvector via a custom Postgres image and manage it yourself, then pay $5/mo for the smallest instance. Neon's free tier is functionally equivalent for our scale (~30K chunks) and has zero setup beyond toggling the extension. If you want everything inside Fly's network later (lower latency, fewer external dependencies), swap `DATABASE_URL` to a Fly Postgres endpoint and re-run step 6. The application code doesn't care.