File size: 6,819 Bytes
08fc97e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | # Deploying rag-psych to Fly.io
Architecture: **Fly.io for the API container, Neon for Postgres + pgvector.**
Both have free tiers that cover the demo. The API auto-stops when idle so
the bill stays at $0 between visits.
> **Critical:** never run `--sources dsm5` against the remote DB. The DSM
> chunks are licensed for local personal use only and must stay in your
> laptop's `pgdata` volume. The `scripts/seed_remote.sh` helper rejects
> any attempt to ingest DSM remotely.
---
## 1. Provision Postgres on Neon (free)
1. Sign up at [console.neon.tech](https://console.neon.tech) (GitHub auth, free).
2. Create a new project. Region: pick one near your Fly region (we'll use
`ord` below; Neon's `aws-us-east-2` is closest).
3. In the project dashboard, go to **Settings β Extensions** and enable
`vector`. (One toggle. Neon ships pgvector pre-installed; you just
activate it.)
4. Copy the **Connection string** from the dashboard. It looks like:
```
postgresql://USER:PASSWORD@ep-xyz-12345.us-east-2.aws.neon.tech/rag_psych?sslmode=require
```
5. Apply our schema. From your laptop:
```bash
psql 'postgresql://USER:PASSWORD@ep-xyz-12345.us-east-2.aws.neon.tech/rag_psych?sslmode=require' \
-f ingest/schema.sql
```
---
## 2. Install Fly's CLI and authenticate
```bash
brew install flyctl # macOS; see fly.io/docs/hands-on/install-flyctl/ for others
fly auth signup # or: fly auth login (browser)
```
Free trial credit covers the demo for the first month. After that the
runtime cost is dominated by uptime, which scale-to-zero pushes near zero.
---
## 3. Launch the Fly app (no deploy yet)
From the repo root:
```bash
fly launch --no-deploy --copy-config --name rag-psych-<your-suffix>
```
When prompted:
- **Region**: pick the same one your Neon DB is closest to (e.g. `ord`)
- **Postgres**: **No** (we're using Neon, not Fly Postgres)
- **Redis**: No
- **Settings**: keep the existing `fly.toml` (the `--copy-config` flag preserves it)
Open the generated `fly.toml` and update the `app = "..."` line to match
the unique name Fly assigned (the launcher overwrites it).
---
## 4. Set secrets
These never appear in `fly.toml` or the image. They live encrypted in
Fly's secret store and are injected as env vars at runtime.
```bash
fly secrets set \
DATABASE_URL='postgresql://USER:PASSWORD@ep-xyz.us-east-2.aws.neon.tech/rag_psych?sslmode=require' \
ANTHROPIC_API_KEY='sk-ant-api03-...' \
ANTHROPIC_MODEL='claude-haiku-4-5' \
NCBI_EMAIL='you@example.com' \
ICD_CLIENT_ID='...' \
ICD_CLIENT_SECRET='...' \
EVAL_PASSWORD="$(python3 -c 'import secrets; print(secrets.token_urlsafe(16))')" \
CORS_ORIGIN='https://rag-psych-<your-suffix>.fly.dev'
```
> β οΈ **Rotate your local `.env` keys before deploying.** The
> `ANTHROPIC_API_KEY` and `EVAL_PASSWORD` currently in `.env` should be
> regenerated for production β assume the old ones are compromised
> (anything that touches a chat with an LLM is). Console:
> [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys).
---
## 5. Deploy
```bash
fly deploy
```
First push uploads ~3.5 GB (models baked into the image β see
`api/Dockerfile`). Subsequent deploys only push the layers that changed,
so code edits redeploy in seconds.
When it finishes:
```bash
fly status # check machine health
fly logs # follow startup logs
open https://rag-psych-<your-suffix>.fly.dev/health
```
A `{"status":"ok"}` response means the API is up. The first `/query`
will pay a 5β10 s cold-start while the embedder + reranker load.
---
## 6. Seed the remote database
The Neon DB is empty after step 1. Run ingest from your laptop against it:
```bash
DATABASE_URL='postgresql://USER:PASSWORD@ep-xyz.us-east-2.aws.neon.tech/rag_psych?sslmode=require' \
./scripts/seed_remote.sh
```
This takes 10β15 minutes wall time, mostly the PubMed `efetch` loop on
the first run. The cached JSON files in `data/cache/` mean re-runs are
near-instant.
After it finishes, hit the deployed UI:
```bash
open https://rag-psych-<your-suffix>.fly.dev/ui
```
---
## 7. Production hardening (recommended before sharing publicly)
| Item | How |
|---|---|
| Anthropic spend cap | Drop to **$5/week** at console.anthropic.com β Settings β Limits before sharing the URL |
| `/eval` password | Confirm `EVAL_PASSWORD` is random + long. Never the local-dev value. |
| Rate limit | Already 30/min/IP. If you see abuse, drop to 10/min in `api/main.py`'s `@limiter.limit` decorator and `fly deploy` |
| CORS | Only the Fly subdomain in production. If you remove the localhost dev origin from the Fly secret, dev still works locally because your `.env` has its own value |
| Healthcheck grace | `fly.toml` already gives 30 s for cold-start. If you see flapping, bump to 60 s |
| Auto-stop | Already on (`auto_stop_machines = "stop"`). Verify with `fly status` β idle machines should report `stopped` |
---
## 8. Day-2 ops cheatsheet
```bash
fly status # which machines are running
fly logs # tail combined logs
fly logs --instance <id> # one machine
fly ssh console # shell into a running instance
fly secrets list # what's set (values not shown)
fly secrets unset SOME_KEY # remove
fly scale memory 4096 # bump RAM if rerank latency is bad
fly scale count 2 # add a second machine for redundancy
fly apps destroy rag-psych-<...> # nuke everything (careful)
```
---
## 9. Things you'll feel in production that you don't on localhost
- **First request is slow** (5β10 s) when the machine just woke from
auto-stop. Subsequent requests in the same minute are fast.
- **Latency floor is higher** because every request crosses the public
internet (Fly β Neon) instead of localhost. Expect ~2Γ the times shown
in `eval/results/*.json`.
- **Anthropic costs scale with usage.** A naive demo URL hit by 100 curious
visitors at 5 queries each = ~$2 of Haiku tokens. Spend cap protects you.
- **No DSM in answers.** If a query that worked locally suddenly returns
the refusal string in production, you're hitting the DSM-shaped hole in
the public corpus β that's expected.
---
## Why not Fly Postgres?
Fly's managed Postgres is fine but you'd have to install pgvector via a
custom Postgres image and manage it yourself, then pay $5/mo for the
smallest instance. Neon's free tier is functionally equivalent for our
scale (~30K chunks) and has zero setup beyond toggling the extension.
If you want everything inside Fly's network later (lower latency,
fewer external dependencies), swap `DATABASE_URL` to a Fly Postgres
endpoint and re-run step 6. The application code doesn't care.
|