File size: 14,569 Bytes
e327f0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
# Deploy Guide β€” Render.com

End-to-end walkthrough for deploying HasarΔ° to Render.com using `render.yaml`. Covers prerequisites, infra setup, environment configuration, the first deploy, smoke tests, monitoring, rollback, and cost.

> Target audience: anyone with shell access and a Render account, no prior Render experience required.

---

## Prerequisites

Before you start, you need:

| Item | Why | How to get it |
|---|---|---|
| **Render account** | Hosts the API + worker | [render.com](https://render.com) β€” free tier ok for the web service; Postgres/Redis are paid |
| **GitHub access to the repo** | Render builds from git | `arac-hasar-v2` repo permissions |
| **AWS S3 bucket** (or compatible) | Image storage (uploads + visualizations) | Or use Cloudflare R2 / Backblaze B2 β€” anything S3-compatible |
| **AWS IAM access key** with `s3:GetObject`, `s3:PutObject`, `s3:DeleteObject` on the bucket | Backend uploads/signs URLs | AWS console β†’ IAM |
| **Custom domain** (optional) | Branded URL | Any registrar, point CNAME at Render |
| **A strong `JWT_SECRET_KEY`** | Signs auth tokens | `openssl rand -base64 48` |
| **Sentry DSN** (optional) | Error tracking | [sentry.io](https://sentry.io) |

**Time estimate**: 45–90 minutes for the first deploy. Subsequent deploys are git-push fast.

---

## Step 1 β€” Provision infrastructure

The `render.yaml` at the repo root declares all services. Render reads it on first connect and creates everything in one go.

### 1a. Create the Postgres database

In the Render dashboard:

1. **New +** β†’ **PostgreSQL**
2. **Name**: `hasari-db`
3. **Database**: `hasari`
4. **User**: `hasari`
5. **Region**: pick the one nearest your users (Frankfurt for EU/TR, Oregon for US)
6. **Plan**: **Starter** ($7/month) is sufficient for the pilot. Free tier expires after 90 days β€” do not use it for production.
7. **Create database**.

Copy the **Internal Database URL** (format: `postgres://hasari:…@…/hasari`). You'll paste it as `DATABASE_URL` in step 2.

### 1b. Create the Redis instance

1. **New +** β†’ **Redis**
2. **Name**: `hasari-redis`
3. **Region**: same as Postgres
4. **Plan**: **Starter** ($10/month). Free tier has no persistence; do not use for production.
5. **Maxmemory policy**: `allkeys-lru`
6. **Create Redis**.

Copy the **Internal Redis URL** β€” you'll use it as `REDIS_URL`.

### 1c. Create the S3 bucket

In the AWS console:

1. S3 β†’ **Create bucket** β†’ `hasari-uploads-prod` (or your name), region matching the API for low latency
2. **Block all public access** β€” yes, keep all four boxes checked. The backend serves presigned URLs; no public listing.
3. **Versioning**: disabled (uploads are immutable; no need for revisions)
4. **Server-side encryption**: SSE-S3 (default) is fine
5. After creation: **Permissions** β†’ **CORS** β†’ add:

```json
[
  {
    "AllowedHeaders": ["*"],
    "AllowedMethods": ["GET", "PUT", "POST"],
    "AllowedOrigins": ["https://hasari.app", "https://hasari-api.onrender.com"],
    "ExposeHeaders": ["ETag", "x-amz-request-id"],
    "MaxAgeSeconds": 3000
  }
]
```

Replace the origins with your actual web app URL.

6. IAM β†’ create an IAM user `hasari-backend`, attach an inline policy:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": "arn:aws:s3:::hasari-uploads-prod/*"
    },
    {
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::hasari-uploads-prod"
    }
  ]
}
```

Create an access key for this user β€” save both halves now, you cannot retrieve the secret again.

---

## Step 2 β€” Configure the backend service

### 2a. Connect the repo

1. Render dashboard β†’ **New +** β†’ **Blueprint**
2. Connect your GitHub account, pick `arac-hasar-v2`
3. Render detects `render.yaml` and shows the services it will create:
   - `hasari-api` β€” web service (FastAPI)
   - `hasari-worker` β€” background worker (Celery)
4. Click **Apply** to create both.

### 2b. Set environment variables

For **each** of `hasari-api` and `hasari-worker`, go to **Environment** and add:

#### Required

| Name | Example | Description | Security note |
|---|---|---|---|
| `ENVIRONMENT` | `production` | Disables dev auth fallback | Never set to `dev` here |
| `DATABASE_URL` | `postgres://…` | From step 1a | Internal URL only; no external traffic |
| `REDIS_URL` | `redis://…` | From step 1b | Internal URL only |
| `JWT_SECRET_KEY` | (32+ char random string) | Signs JWTs | Generate fresh: `openssl rand -base64 48`. Rotating invalidates all existing sessions. |
| `S3_BUCKET` | `hasari-uploads-prod` | Bucket name | β€” |
| `S3_REGION` | `eu-central-1` | AWS region | β€” |
| `S3_ACCESS_KEY` | `AKIA…` | From step 1c | Use a dedicated IAM user, not your root key |
| `S3_SECRET_KEY` | `…` | From step 1c | Mark this var as "secret" in Render UI |
| `S3_ENDPOINT_URL` | (blank for AWS) | Set only for R2/MinIO/B2 | β€” |
| `CORS_ORIGINS` | `https://hasari.app,https://www.hasari.app` | Comma-separated allowed web origins | Never use `*` in production |

#### Recommended

| Name | Default | Description |
|---|---|---|
| `ACCESS_TOKEN_MINUTES` | `30` | Short access TTL β€” keeps damage from a stolen token bounded |
| `REFRESH_TOKEN_DAYS` | `7` | Refresh TTL β€” balance UX vs. risk |
| `MAX_IMAGE_SIZE_MB` | `12` | Per-image upload limit |
| `MAX_IMAGES_SYNC` | `5` | Sync mode cap |
| `MAX_IMAGES_ASYNC` | `20` | Async mode cap |
| `SENTRY_DSN` | (blank) | Enable Sentry error tracking |
| `LOG_LEVEL` | `INFO` | `DEBUG` for troubleshooting; keep `INFO` in prod |

#### ML service

| Name | Default | Description |
|---|---|---|
| `ML_MODEL_DIR` | `/app/models` | Path to YOLO `.pt` weight files inside the container |
| `ML_DEVICE` | `cpu` | `cuda` requires a GPU instance (Render does not offer GPU β€” keep CPU on Render and offload heavy ML to a separate GPU host or external service for production) |

> **GPU note**: Render does not currently offer GPU instances. For the pilot, the backend runs YOLO on CPU β€” slower (~5–10Γ— CPU vs. GPU). For production loads above ~50 inspections/hour, host the ML service separately on a GPU VPS (Hetzner, RunPod, etc.) and point `ML_SERVICE_URL` at it. Architecture diagram in [README.md](../README.md#architecture).

### 2c. Build & start commands

If `render.yaml` is missing these, set them manually:

**hasari-api** (web service):
- Build: `pip install -r services/backend/requirements.txt`
- Start: `cd services/backend && alembic upgrade head && uvicorn main:app --host 0.0.0.0 --port $PORT`

**hasari-worker** (background worker):
- Build: same
- Start: `cd services/backend && celery -A worker worker --loglevel=info --concurrency=2`

---

## Step 3 β€” First deploy

1. Both services auto-deploy on push to `main`. Trigger the first deploy:
   - Go to **hasari-api** β†’ **Manual Deploy** β†’ **Deploy latest commit**.
2. Watch the build log. Expected duration: 8–15 minutes (downloads PyTorch, YOLO weights).
3. The first start runs `alembic upgrade head` β€” DB schema is created.
4. Once "Your service is live" appears, hit the health endpoint:

```bash
curl https://hasari-api.onrender.com/health
```

Expected:

```json
{"status":"ok","ml_loaded":true,"timestamp":"2026-05-15T...","version":"0.1.0"}
```

If `ml_loaded` is `false`, check the start log for "ML pipeline init failed" β€” usually means model weights are missing from the container. Run a one-off SSH session and run `python -c "from ml_service import ml_pipeline; print(ml_pipeline.is_loaded())"`.

---

## Step 4 β€” Create the admin user

The first admin must be created out-of-band β€” there is no admin-registration UI. SSH into the API container via the Render dashboard (**Shell** tab) and run:

```bash
cd services/backend
python -c "
from database import init_db
from auth import _repo
from security import hash_password
init_db()
user = _repo.create(
    email='admin@yourcompany.com',
    password_hash=hash_password('CHANGE_ME_now_strong_password'),
    full_name='Admin',
)
# Promote
import psycopg
with psycopg.connect('$DATABASE_URL') as conn:
    conn.execute('UPDATE users SET role=%s WHERE id=%s', ('admin', user['id']))
    conn.commit()
print('Admin user created:', user['email'])
"
```

Sign in immediately at `https://hasari.app/login` and rotate the password through the UI.

---

## Step 5 β€” Smoke test checklist

Before announcing the deploy is "done", run through this list. Each item should pass on the first try.

- [ ] `GET /health` returns 200 with `ml_loaded: true`
- [ ] `GET /api/v1/version` returns expected git SHA and build time
- [ ] `POST /auth/register` with a new email returns 201 + token pair
- [ ] `POST /auth/login` with that email returns 200 + new token pair
- [ ] `GET /auth/me` with the access token returns the user
- [ ] `POST /auth/refresh` with the refresh token returns a new token pair
- [ ] `POST /api/v1/inspect/sync` with a 1MB JPG returns 200 with parts/damages JSON within 15 seconds
- [ ] `GET /api/v1/inspect` returns the inspection in the list
- [ ] `GET /api/v1/inspect/{id}/visualization/annotated` redirects to a presigned S3 URL that returns a PNG
- [ ] `DELETE /api/v1/inspect/{id}` removes it (subsequent GET returns 404)
- [ ] Web app loads at custom domain, language defaults to TR
- [ ] Sign in via web app, complete one inspection end-to-end
- [ ] Open Render logs β€” no `ERROR` or `CRITICAL` entries in the past hour
- [ ] Postgres connection count < 20 (visible in Render β†’ hasari-db β†’ Metrics)

If any item fails, **do not** announce the launch. See [Troubleshooting](#troubleshooting).

---

## Monitoring & log access

### Logs

- **Render dashboard**: hasari-api β†’ **Logs** tab β€” live tail.
- **CLI**: `render logs --service hasari-api --tail` (install [render-cli](https://render.com/docs/cli)).
- **Structured JSON**: every log line is `{"time":..., "level":..., "logger":..., "msg":...}` β€” pipe to `jq` for filtering.

### Metrics

- **Render built-in**: CPU, memory, response time, throughput visible per service in the dashboard.
- **Prometheus**: scrape `https://hasari-api.onrender.com/metrics` (requires `Authorization: Bearer <admin token>`). See `observability/` for a Grafana dashboard JSON to import.

### Alerts

Configure in Render β†’ service β†’ **Notifications**:
- **Deploy failed** β†’ Slack/email
- **Service crashed** β†’ on-call rotation
- **Disk usage > 80%** β†’ Slack

For app-level alerts (error rate > 1%, p95 latency > 3s), set up Sentry alerts on the `SENTRY_DSN` project.

---

## Rolling back a bad deploy

If the latest deploy is broken:

1. Render dashboard β†’ **hasari-api** β†’ **Events** tab.
2. Find the last known-good deploy (green checkmark).
3. Click **Rollback to this deploy**.
4. Confirm. Render redeploys the previous Docker image β€” takes ~30 seconds.

For database migrations that cannot be rolled back automatically:

```bash
# In the Render shell:
cd services/backend
alembic downgrade -1
```

**Important**: never `alembic downgrade` a migration that dropped a column with live data β€” you will lose data. Pre-launch, test every migration's `downgrade()` against a copy of production data.

---

## Cost estimate (monthly, pilot scale)

| Item | Plan | Cost |
|---|---|---|
| Render web service (`hasari-api`) | Starter (512 MB) | $7 |
| Render background worker (`hasari-worker`) | Starter (512 MB) | $7 |
| Render Postgres | Starter | $7 |
| Render Redis | Starter | $10 |
| AWS S3 (10 GB storage, 100k req/month) | Pay-as-you-go | ~$1 |
| AWS data transfer (out) | Pay-as-you-go | ~$2 |
| Custom domain | (you own it) | $0 |
| Sentry (free tier) | Developer | $0 |
| **Total** | | **~$34/month** |

Scaling beyond ~500 inspections/day will require:
- Larger Render plans (Standard: $25/service)
- Moving ML to a GPU VPS (Hetzner GPU: $80/month)
- S3 storage growth: $0.023/GB/month

---

## Troubleshooting

### `ml_loaded: false` at startup

**Cause**: model weights missing or wrong path.
**Fix**: ensure `services/ml/yolo11m-seg.pt`, `yolo11s-seg.pt`, `yolo11n-cls.pt` are committed to the repo or downloaded in the build step. Check `ML_MODEL_DIR` env var.

### 503 "Is kuyrugu su an kullanilamiyor"

**Cause**: Celery worker can't reach Redis.
**Fix**: confirm `REDIS_URL` is set on **hasari-worker** (not just api). Restart worker.

### Postgres connection limit exceeded

**Cause**: too many open connections β€” usually a long-running query or leaked sessions.
**Fix**: check Render β†’ Postgres β†’ Metrics β†’ "Active connections". Restart API service to drop them. Add `pool_pre_ping=True` and `pool_recycle=300` to SQLAlchemy engine config.

### S3 403 on upload

**Cause**: IAM policy doesn't grant `s3:PutObject`, or bucket name typo, or wrong region.
**Fix**: run `aws s3 ls s3://your-bucket-name --region eu-central-1` from anywhere with the same credentials to verify.

### Web app CORS error

**Cause**: `CORS_ORIGINS` doesn't include the actual origin (don't forget `https://`, no trailing slash).
**Fix**: update env var, redeploy API.

### Health check passing but inspections always fail with 500

**Cause**: usually an unhandled exception in the ML pipeline (corrupt model, OOM, missing class).
**Fix**: enable `LOG_LEVEL=DEBUG`, reproduce, read the traceback in logs. If it's OOM, upgrade to Standard plan or move ML off-box.

### Migration locked / `alembic upgrade head` hangs

**Cause**: a previous migration left a lock in the `alembic_version` table.
**Fix**: in psql, `DELETE FROM alembic_locks;` (table name varies β€” check your alembic config) or set `LOCK_TIMEOUT` and retry.

---

## Post-launch monitoring (first 48 hours)

Watch these metrics every 4 hours for the first two days:

- **Error rate** (Sentry): target < 0.5% of requests
- **API latency p95** (Render metrics): target < 1.5 s for non-inspection endpoints
- **Inspection latency p95**: target < 12 s end-to-end for 4-photo batches
- **Database active connections**: target < 50% of pool max
- **Redis memory**: target < 80% of plan limit
- **Failed inspection rate**: target < 2% of jobs reaching `failed`

If any metric exceeds target for 30+ minutes, treat as a P1 incident.

---

## Related docs

- [API_GUIDE.md](API_GUIDE.md) β€” REST contract for smoke tests
- [AUTH_FLOW.md](AUTH_FLOW.md) β€” token lifecycle (env vars relevant)
- [LAUNCH_CHECKLIST.md](LAUNCH_CHECKLIST.md) β€” pre-go-live sign-off gates
- [OBSERVABILITY_SETUP.md](OBSERVABILITY_SETUP.md) β€” Prometheus + Grafana wiring