File size: 6,819 Bytes
08fc97e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# Deploying rag-psych to Fly.io

Architecture: **Fly.io for the API container, Neon for Postgres + pgvector.**
Both have free tiers that cover the demo. The API auto-stops when idle so
the bill stays at $0 between visits.

> **Critical:** never run `--sources dsm5` against the remote DB. The DSM
> chunks are licensed for local personal use only and must stay in your
> laptop's `pgdata` volume. The `scripts/seed_remote.sh` helper rejects
> any attempt to ingest DSM remotely.

---

## 1. Provision Postgres on Neon (free)

1. Sign up at [console.neon.tech](https://console.neon.tech) (GitHub auth, free).
2. Create a new project. Region: pick one near your Fly region (we'll use
   `ord` below; Neon's `aws-us-east-2` is closest).
3. In the project dashboard, go to **Settings β†’ Extensions** and enable
   `vector`. (One toggle. Neon ships pgvector pre-installed; you just
   activate it.)
4. Copy the **Connection string** from the dashboard. It looks like:
   ```
   postgresql://USER:PASSWORD@ep-xyz-12345.us-east-2.aws.neon.tech/rag_psych?sslmode=require
   ```
5. Apply our schema. From your laptop:
   ```bash
   psql 'postgresql://USER:PASSWORD@ep-xyz-12345.us-east-2.aws.neon.tech/rag_psych?sslmode=require' \
     -f ingest/schema.sql
   ```

---

## 2. Install Fly's CLI and authenticate

```bash
brew install flyctl              # macOS; see fly.io/docs/hands-on/install-flyctl/ for others
fly auth signup                  # or: fly auth login (browser)
```

Free trial credit covers the demo for the first month. After that the
runtime cost is dominated by uptime, which scale-to-zero pushes near zero.

---

## 3. Launch the Fly app (no deploy yet)

From the repo root:

```bash
fly launch --no-deploy --copy-config --name rag-psych-<your-suffix>
```

When prompted:
- **Region**: pick the same one your Neon DB is closest to (e.g. `ord`)
- **Postgres**: **No** (we're using Neon, not Fly Postgres)
- **Redis**: No
- **Settings**: keep the existing `fly.toml` (the `--copy-config` flag preserves it)

Open the generated `fly.toml` and update the `app = "..."` line to match
the unique name Fly assigned (the launcher overwrites it).

---

## 4. Set secrets

These never appear in `fly.toml` or the image. They live encrypted in
Fly's secret store and are injected as env vars at runtime.

```bash
fly secrets set \
  DATABASE_URL='postgresql://USER:PASSWORD@ep-xyz.us-east-2.aws.neon.tech/rag_psych?sslmode=require' \
  ANTHROPIC_API_KEY='sk-ant-api03-...' \
  ANTHROPIC_MODEL='claude-haiku-4-5' \
  NCBI_EMAIL='you@example.com' \
  ICD_CLIENT_ID='...' \
  ICD_CLIENT_SECRET='...' \
  EVAL_PASSWORD="$(python3 -c 'import secrets; print(secrets.token_urlsafe(16))')" \
  CORS_ORIGIN='https://rag-psych-<your-suffix>.fly.dev'
```

> ⚠️ **Rotate your local `.env` keys before deploying.** The
> `ANTHROPIC_API_KEY` and `EVAL_PASSWORD` currently in `.env` should be
> regenerated for production β€” assume the old ones are compromised
> (anything that touches a chat with an LLM is). Console:
> [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys).

---

## 5. Deploy

```bash
fly deploy
```

First push uploads ~3.5 GB (models baked into the image β€” see
`api/Dockerfile`). Subsequent deploys only push the layers that changed,
so code edits redeploy in seconds.

When it finishes:

```bash
fly status                        # check machine health
fly logs                          # follow startup logs
open https://rag-psych-<your-suffix>.fly.dev/health
```

A `{"status":"ok"}` response means the API is up. The first `/query`
will pay a 5–10 s cold-start while the embedder + reranker load.

---

## 6. Seed the remote database

The Neon DB is empty after step 1. Run ingest from your laptop against it:

```bash
DATABASE_URL='postgresql://USER:PASSWORD@ep-xyz.us-east-2.aws.neon.tech/rag_psych?sslmode=require' \
  ./scripts/seed_remote.sh
```

This takes 10–15 minutes wall time, mostly the PubMed `efetch` loop on
the first run. The cached JSON files in `data/cache/` mean re-runs are
near-instant.

After it finishes, hit the deployed UI:

```bash
open https://rag-psych-<your-suffix>.fly.dev/ui
```

---

## 7. Production hardening (recommended before sharing publicly)

| Item | How |
|---|---|
| Anthropic spend cap | Drop to **$5/week** at console.anthropic.com β†’ Settings β†’ Limits before sharing the URL |
| `/eval` password | Confirm `EVAL_PASSWORD` is random + long. Never the local-dev value. |
| Rate limit | Already 30/min/IP. If you see abuse, drop to 10/min in `api/main.py`'s `@limiter.limit` decorator and `fly deploy` |
| CORS | Only the Fly subdomain in production. If you remove the localhost dev origin from the Fly secret, dev still works locally because your `.env` has its own value |
| Healthcheck grace | `fly.toml` already gives 30 s for cold-start. If you see flapping, bump to 60 s |
| Auto-stop | Already on (`auto_stop_machines = "stop"`). Verify with `fly status` β€” idle machines should report `stopped` |

---

## 8. Day-2 ops cheatsheet

```bash
fly status                       # which machines are running
fly logs                         # tail combined logs
fly logs --instance <id>         # one machine
fly ssh console                  # shell into a running instance
fly secrets list                 # what's set (values not shown)
fly secrets unset SOME_KEY       # remove
fly scale memory 4096            # bump RAM if rerank latency is bad
fly scale count 2                # add a second machine for redundancy
fly apps destroy rag-psych-<...> # nuke everything (careful)
```

---

## 9. Things you'll feel in production that you don't on localhost

- **First request is slow** (5–10 s) when the machine just woke from
  auto-stop. Subsequent requests in the same minute are fast.
- **Latency floor is higher** because every request crosses the public
  internet (Fly ↔ Neon) instead of localhost. Expect ~2Γ— the times shown
  in `eval/results/*.json`.
- **Anthropic costs scale with usage.** A naive demo URL hit by 100 curious
  visitors at 5 queries each = ~$2 of Haiku tokens. Spend cap protects you.
- **No DSM in answers.** If a query that worked locally suddenly returns
  the refusal string in production, you're hitting the DSM-shaped hole in
  the public corpus β€” that's expected.

---

## Why not Fly Postgres?

Fly's managed Postgres is fine but you'd have to install pgvector via a
custom Postgres image and manage it yourself, then pay $5/mo for the
smallest instance. Neon's free tier is functionally equivalent for our
scale (~30K chunks) and has zero setup beyond toggling the extension.

If you want everything inside Fly's network later (lower latency,
fewer external dependencies), swap `DATABASE_URL` to a Fly Postgres
endpoint and re-run step 6. The application code doesn't care.