Spaces:
Sleeping
Sleeping
File size: 14,569 Bytes
e327f0d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 | # Deploy Guide β Render.com
End-to-end walkthrough for deploying HasarΔ° to Render.com using `render.yaml`. Covers prerequisites, infra setup, environment configuration, the first deploy, smoke tests, monitoring, rollback, and cost.
> Target audience: anyone with shell access and a Render account, no prior Render experience required.
---
## Prerequisites
Before you start, you need:
| Item | Why | How to get it |
|---|---|---|
| **Render account** | Hosts the API + worker | [render.com](https://render.com) β free tier ok for the web service; Postgres/Redis are paid |
| **GitHub access to the repo** | Render builds from git | `arac-hasar-v2` repo permissions |
| **AWS S3 bucket** (or compatible) | Image storage (uploads + visualizations) | Or use Cloudflare R2 / Backblaze B2 β anything S3-compatible |
| **AWS IAM access key** with `s3:GetObject`, `s3:PutObject`, `s3:DeleteObject` on the bucket | Backend uploads/signs URLs | AWS console β IAM |
| **Custom domain** (optional) | Branded URL | Any registrar, point CNAME at Render |
| **A strong `JWT_SECRET_KEY`** | Signs auth tokens | `openssl rand -base64 48` |
| **Sentry DSN** (optional) | Error tracking | [sentry.io](https://sentry.io) |
**Time estimate**: 45β90 minutes for the first deploy. Subsequent deploys are git-push fast.
---
## Step 1 β Provision infrastructure
The `render.yaml` at the repo root declares all services. Render reads it on first connect and creates everything in one go.
### 1a. Create the Postgres database
In the Render dashboard:
1. **New +** β **PostgreSQL**
2. **Name**: `hasari-db`
3. **Database**: `hasari`
4. **User**: `hasari`
5. **Region**: pick the one nearest your users (Frankfurt for EU/TR, Oregon for US)
6. **Plan**: **Starter** ($7/month) is sufficient for the pilot. Free tier expires after 90 days β do not use it for production.
7. **Create database**.
Copy the **Internal Database URL** (format: `postgres://hasari:β¦@β¦/hasari`). You'll paste it as `DATABASE_URL` in step 2.
### 1b. Create the Redis instance
1. **New +** β **Redis**
2. **Name**: `hasari-redis`
3. **Region**: same as Postgres
4. **Plan**: **Starter** ($10/month). Free tier has no persistence; do not use for production.
5. **Maxmemory policy**: `allkeys-lru`
6. **Create Redis**.
Copy the **Internal Redis URL** β you'll use it as `REDIS_URL`.
### 1c. Create the S3 bucket
In the AWS console:
1. S3 β **Create bucket** β `hasari-uploads-prod` (or your name), region matching the API for low latency
2. **Block all public access** β yes, keep all four boxes checked. The backend serves presigned URLs; no public listing.
3. **Versioning**: disabled (uploads are immutable; no need for revisions)
4. **Server-side encryption**: SSE-S3 (default) is fine
5. After creation: **Permissions** β **CORS** β add:
```json
[
{
"AllowedHeaders": ["*"],
"AllowedMethods": ["GET", "PUT", "POST"],
"AllowedOrigins": ["https://hasari.app", "https://hasari-api.onrender.com"],
"ExposeHeaders": ["ETag", "x-amz-request-id"],
"MaxAgeSeconds": 3000
}
]
```
Replace the origins with your actual web app URL.
6. IAM β create an IAM user `hasari-backend`, attach an inline policy:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
"Resource": "arn:aws:s3:::hasari-uploads-prod/*"
},
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::hasari-uploads-prod"
}
]
}
```
Create an access key for this user β save both halves now, you cannot retrieve the secret again.
---
## Step 2 β Configure the backend service
### 2a. Connect the repo
1. Render dashboard β **New +** β **Blueprint**
2. Connect your GitHub account, pick `arac-hasar-v2`
3. Render detects `render.yaml` and shows the services it will create:
- `hasari-api` β web service (FastAPI)
- `hasari-worker` β background worker (Celery)
4. Click **Apply** to create both.
### 2b. Set environment variables
For **each** of `hasari-api` and `hasari-worker`, go to **Environment** and add:
#### Required
| Name | Example | Description | Security note |
|---|---|---|---|
| `ENVIRONMENT` | `production` | Disables dev auth fallback | Never set to `dev` here |
| `DATABASE_URL` | `postgres://β¦` | From step 1a | Internal URL only; no external traffic |
| `REDIS_URL` | `redis://β¦` | From step 1b | Internal URL only |
| `JWT_SECRET_KEY` | (32+ char random string) | Signs JWTs | Generate fresh: `openssl rand -base64 48`. Rotating invalidates all existing sessions. |
| `S3_BUCKET` | `hasari-uploads-prod` | Bucket name | β |
| `S3_REGION` | `eu-central-1` | AWS region | β |
| `S3_ACCESS_KEY` | `AKIAβ¦` | From step 1c | Use a dedicated IAM user, not your root key |
| `S3_SECRET_KEY` | `β¦` | From step 1c | Mark this var as "secret" in Render UI |
| `S3_ENDPOINT_URL` | (blank for AWS) | Set only for R2/MinIO/B2 | β |
| `CORS_ORIGINS` | `https://hasari.app,https://www.hasari.app` | Comma-separated allowed web origins | Never use `*` in production |
#### Recommended
| Name | Default | Description |
|---|---|---|
| `ACCESS_TOKEN_MINUTES` | `30` | Short access TTL β keeps damage from a stolen token bounded |
| `REFRESH_TOKEN_DAYS` | `7` | Refresh TTL β balance UX vs. risk |
| `MAX_IMAGE_SIZE_MB` | `12` | Per-image upload limit |
| `MAX_IMAGES_SYNC` | `5` | Sync mode cap |
| `MAX_IMAGES_ASYNC` | `20` | Async mode cap |
| `SENTRY_DSN` | (blank) | Enable Sentry error tracking |
| `LOG_LEVEL` | `INFO` | `DEBUG` for troubleshooting; keep `INFO` in prod |
#### ML service
| Name | Default | Description |
|---|---|---|
| `ML_MODEL_DIR` | `/app/models` | Path to YOLO `.pt` weight files inside the container |
| `ML_DEVICE` | `cpu` | `cuda` requires a GPU instance (Render does not offer GPU β keep CPU on Render and offload heavy ML to a separate GPU host or external service for production) |
> **GPU note**: Render does not currently offer GPU instances. For the pilot, the backend runs YOLO on CPU β slower (~5β10Γ CPU vs. GPU). For production loads above ~50 inspections/hour, host the ML service separately on a GPU VPS (Hetzner, RunPod, etc.) and point `ML_SERVICE_URL` at it. Architecture diagram in [README.md](../README.md#architecture).
### 2c. Build & start commands
If `render.yaml` is missing these, set them manually:
**hasari-api** (web service):
- Build: `pip install -r services/backend/requirements.txt`
- Start: `cd services/backend && alembic upgrade head && uvicorn main:app --host 0.0.0.0 --port $PORT`
**hasari-worker** (background worker):
- Build: same
- Start: `cd services/backend && celery -A worker worker --loglevel=info --concurrency=2`
---
## Step 3 β First deploy
1. Both services auto-deploy on push to `main`. Trigger the first deploy:
- Go to **hasari-api** β **Manual Deploy** β **Deploy latest commit**.
2. Watch the build log. Expected duration: 8β15 minutes (downloads PyTorch, YOLO weights).
3. The first start runs `alembic upgrade head` β DB schema is created.
4. Once "Your service is live" appears, hit the health endpoint:
```bash
curl https://hasari-api.onrender.com/health
```
Expected:
```json
{"status":"ok","ml_loaded":true,"timestamp":"2026-05-15T...","version":"0.1.0"}
```
If `ml_loaded` is `false`, check the start log for "ML pipeline init failed" β usually means model weights are missing from the container. Run a one-off SSH session and run `python -c "from ml_service import ml_pipeline; print(ml_pipeline.is_loaded())"`.
---
## Step 4 β Create the admin user
The first admin must be created out-of-band β there is no admin-registration UI. SSH into the API container via the Render dashboard (**Shell** tab) and run:
```bash
cd services/backend
python -c "
from database import init_db
from auth import _repo
from security import hash_password
init_db()
user = _repo.create(
email='admin@yourcompany.com',
password_hash=hash_password('CHANGE_ME_now_strong_password'),
full_name='Admin',
)
# Promote
import psycopg
with psycopg.connect('$DATABASE_URL') as conn:
conn.execute('UPDATE users SET role=%s WHERE id=%s', ('admin', user['id']))
conn.commit()
print('Admin user created:', user['email'])
"
```
Sign in immediately at `https://hasari.app/login` and rotate the password through the UI.
---
## Step 5 β Smoke test checklist
Before announcing the deploy is "done", run through this list. Each item should pass on the first try.
- [ ] `GET /health` returns 200 with `ml_loaded: true`
- [ ] `GET /api/v1/version` returns expected git SHA and build time
- [ ] `POST /auth/register` with a new email returns 201 + token pair
- [ ] `POST /auth/login` with that email returns 200 + new token pair
- [ ] `GET /auth/me` with the access token returns the user
- [ ] `POST /auth/refresh` with the refresh token returns a new token pair
- [ ] `POST /api/v1/inspect/sync` with a 1MB JPG returns 200 with parts/damages JSON within 15 seconds
- [ ] `GET /api/v1/inspect` returns the inspection in the list
- [ ] `GET /api/v1/inspect/{id}/visualization/annotated` redirects to a presigned S3 URL that returns a PNG
- [ ] `DELETE /api/v1/inspect/{id}` removes it (subsequent GET returns 404)
- [ ] Web app loads at custom domain, language defaults to TR
- [ ] Sign in via web app, complete one inspection end-to-end
- [ ] Open Render logs β no `ERROR` or `CRITICAL` entries in the past hour
- [ ] Postgres connection count < 20 (visible in Render β hasari-db β Metrics)
If any item fails, **do not** announce the launch. See [Troubleshooting](#troubleshooting).
---
## Monitoring & log access
### Logs
- **Render dashboard**: hasari-api β **Logs** tab β live tail.
- **CLI**: `render logs --service hasari-api --tail` (install [render-cli](https://render.com/docs/cli)).
- **Structured JSON**: every log line is `{"time":..., "level":..., "logger":..., "msg":...}` β pipe to `jq` for filtering.
### Metrics
- **Render built-in**: CPU, memory, response time, throughput visible per service in the dashboard.
- **Prometheus**: scrape `https://hasari-api.onrender.com/metrics` (requires `Authorization: Bearer <admin token>`). See `observability/` for a Grafana dashboard JSON to import.
### Alerts
Configure in Render β service β **Notifications**:
- **Deploy failed** β Slack/email
- **Service crashed** β on-call rotation
- **Disk usage > 80%** β Slack
For app-level alerts (error rate > 1%, p95 latency > 3s), set up Sentry alerts on the `SENTRY_DSN` project.
---
## Rolling back a bad deploy
If the latest deploy is broken:
1. Render dashboard β **hasari-api** β **Events** tab.
2. Find the last known-good deploy (green checkmark).
3. Click **Rollback to this deploy**.
4. Confirm. Render redeploys the previous Docker image β takes ~30 seconds.
For database migrations that cannot be rolled back automatically:
```bash
# In the Render shell:
cd services/backend
alembic downgrade -1
```
**Important**: never `alembic downgrade` a migration that dropped a column with live data β you will lose data. Pre-launch, test every migration's `downgrade()` against a copy of production data.
---
## Cost estimate (monthly, pilot scale)
| Item | Plan | Cost |
|---|---|---|
| Render web service (`hasari-api`) | Starter (512 MB) | $7 |
| Render background worker (`hasari-worker`) | Starter (512 MB) | $7 |
| Render Postgres | Starter | $7 |
| Render Redis | Starter | $10 |
| AWS S3 (10 GB storage, 100k req/month) | Pay-as-you-go | ~$1 |
| AWS data transfer (out) | Pay-as-you-go | ~$2 |
| Custom domain | (you own it) | $0 |
| Sentry (free tier) | Developer | $0 |
| **Total** | | **~$34/month** |
Scaling beyond ~500 inspections/day will require:
- Larger Render plans (Standard: $25/service)
- Moving ML to a GPU VPS (Hetzner GPU: $80/month)
- S3 storage growth: $0.023/GB/month
---
## Troubleshooting
### `ml_loaded: false` at startup
**Cause**: model weights missing or wrong path.
**Fix**: ensure `services/ml/yolo11m-seg.pt`, `yolo11s-seg.pt`, `yolo11n-cls.pt` are committed to the repo or downloaded in the build step. Check `ML_MODEL_DIR` env var.
### 503 "Is kuyrugu su an kullanilamiyor"
**Cause**: Celery worker can't reach Redis.
**Fix**: confirm `REDIS_URL` is set on **hasari-worker** (not just api). Restart worker.
### Postgres connection limit exceeded
**Cause**: too many open connections β usually a long-running query or leaked sessions.
**Fix**: check Render β Postgres β Metrics β "Active connections". Restart API service to drop them. Add `pool_pre_ping=True` and `pool_recycle=300` to SQLAlchemy engine config.
### S3 403 on upload
**Cause**: IAM policy doesn't grant `s3:PutObject`, or bucket name typo, or wrong region.
**Fix**: run `aws s3 ls s3://your-bucket-name --region eu-central-1` from anywhere with the same credentials to verify.
### Web app CORS error
**Cause**: `CORS_ORIGINS` doesn't include the actual origin (don't forget `https://`, no trailing slash).
**Fix**: update env var, redeploy API.
### Health check passing but inspections always fail with 500
**Cause**: usually an unhandled exception in the ML pipeline (corrupt model, OOM, missing class).
**Fix**: enable `LOG_LEVEL=DEBUG`, reproduce, read the traceback in logs. If it's OOM, upgrade to Standard plan or move ML off-box.
### Migration locked / `alembic upgrade head` hangs
**Cause**: a previous migration left a lock in the `alembic_version` table.
**Fix**: in psql, `DELETE FROM alembic_locks;` (table name varies β check your alembic config) or set `LOCK_TIMEOUT` and retry.
---
## Post-launch monitoring (first 48 hours)
Watch these metrics every 4 hours for the first two days:
- **Error rate** (Sentry): target < 0.5% of requests
- **API latency p95** (Render metrics): target < 1.5 s for non-inspection endpoints
- **Inspection latency p95**: target < 12 s end-to-end for 4-photo batches
- **Database active connections**: target < 50% of pool max
- **Redis memory**: target < 80% of plan limit
- **Failed inspection rate**: target < 2% of jobs reaching `failed`
If any metric exceeds target for 30+ minutes, treat as a P1 incident.
---
## Related docs
- [API_GUIDE.md](API_GUIDE.md) β REST contract for smoke tests
- [AUTH_FLOW.md](AUTH_FLOW.md) β token lifecycle (env vars relevant)
- [LAUNCH_CHECKLIST.md](LAUNCH_CHECKLIST.md) β pre-go-live sign-off gates
- [OBSERVABILITY_SETUP.md](OBSERVABILITY_SETUP.md) β Prometheus + Grafana wiring
|