# Deploy Guide — Render.com End-to-end walkthrough for deploying Hasarİ to Render.com using `render.yaml`. Covers prerequisites, infra setup, environment configuration, the first deploy, smoke tests, monitoring, rollback, and cost. > Target audience: anyone with shell access and a Render account, no prior Render experience required. --- ## Prerequisites Before you start, you need: | Item | Why | How to get it | |---|---|---| | **Render account** | Hosts the API + worker | [render.com](https://render.com) — free tier ok for the web service; Postgres/Redis are paid | | **GitHub access to the repo** | Render builds from git | `arac-hasar-v2` repo permissions | | **AWS S3 bucket** (or compatible) | Image storage (uploads + visualizations) | Or use Cloudflare R2 / Backblaze B2 — anything S3-compatible | | **AWS IAM access key** with `s3:GetObject`, `s3:PutObject`, `s3:DeleteObject` on the bucket | Backend uploads/signs URLs | AWS console → IAM | | **Custom domain** (optional) | Branded URL | Any registrar, point CNAME at Render | | **A strong `JWT_SECRET_KEY`** | Signs auth tokens | `openssl rand -base64 48` | | **Sentry DSN** (optional) | Error tracking | [sentry.io](https://sentry.io) | **Time estimate**: 45–90 minutes for the first deploy. Subsequent deploys are git-push fast. --- ## Step 1 — Provision infrastructure The `render.yaml` at the repo root declares all services. Render reads it on first connect and creates everything in one go. ### 1a. Create the Postgres database In the Render dashboard: 1. **New +** → **PostgreSQL** 2. **Name**: `hasari-db` 3. **Database**: `hasari` 4. **User**: `hasari` 5. **Region**: pick the one nearest your users (Frankfurt for EU/TR, Oregon for US) 6. **Plan**: **Starter** ($7/month) is sufficient for the pilot. Free tier expires after 90 days — do not use it for production. 7. **Create database**. Copy the **Internal Database URL** (format: `postgres://hasari:…@…/hasari`). You'll paste it as `DATABASE_URL` in step 2. ### 1b. Create the Redis instance 1. **New +** → **Redis** 2. **Name**: `hasari-redis` 3. **Region**: same as Postgres 4. **Plan**: **Starter** ($10/month). Free tier has no persistence; do not use for production. 5. **Maxmemory policy**: `allkeys-lru` 6. **Create Redis**. Copy the **Internal Redis URL** — you'll use it as `REDIS_URL`. ### 1c. Create the S3 bucket In the AWS console: 1. S3 → **Create bucket** → `hasari-uploads-prod` (or your name), region matching the API for low latency 2. **Block all public access** — yes, keep all four boxes checked. The backend serves presigned URLs; no public listing. 3. **Versioning**: disabled (uploads are immutable; no need for revisions) 4. **Server-side encryption**: SSE-S3 (default) is fine 5. After creation: **Permissions** → **CORS** → add: ```json [ { "AllowedHeaders": ["*"], "AllowedMethods": ["GET", "PUT", "POST"], "AllowedOrigins": ["https://hasari.app", "https://hasari-api.onrender.com"], "ExposeHeaders": ["ETag", "x-amz-request-id"], "MaxAgeSeconds": 3000 } ] ``` Replace the origins with your actual web app URL. 6. IAM → create an IAM user `hasari-backend`, attach an inline policy: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"], "Resource": "arn:aws:s3:::hasari-uploads-prod/*" }, { "Effect": "Allow", "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::hasari-uploads-prod" } ] } ``` Create an access key for this user — save both halves now, you cannot retrieve the secret again. --- ## Step 2 — Configure the backend service ### 2a. Connect the repo 1. Render dashboard → **New +** → **Blueprint** 2. Connect your GitHub account, pick `arac-hasar-v2` 3. Render detects `render.yaml` and shows the services it will create: - `hasari-api` — web service (FastAPI) - `hasari-worker` — background worker (Celery) 4. Click **Apply** to create both. ### 2b. Set environment variables For **each** of `hasari-api` and `hasari-worker`, go to **Environment** and add: #### Required | Name | Example | Description | Security note | |---|---|---|---| | `ENVIRONMENT` | `production` | Disables dev auth fallback | Never set to `dev` here | | `DATABASE_URL` | `postgres://…` | From step 1a | Internal URL only; no external traffic | | `REDIS_URL` | `redis://…` | From step 1b | Internal URL only | | `JWT_SECRET_KEY` | (32+ char random string) | Signs JWTs | Generate fresh: `openssl rand -base64 48`. Rotating invalidates all existing sessions. | | `S3_BUCKET` | `hasari-uploads-prod` | Bucket name | — | | `S3_REGION` | `eu-central-1` | AWS region | — | | `S3_ACCESS_KEY` | `AKIA…` | From step 1c | Use a dedicated IAM user, not your root key | | `S3_SECRET_KEY` | `…` | From step 1c | Mark this var as "secret" in Render UI | | `S3_ENDPOINT_URL` | (blank for AWS) | Set only for R2/MinIO/B2 | — | | `CORS_ORIGINS` | `https://hasari.app,https://www.hasari.app` | Comma-separated allowed web origins | Never use `*` in production | #### Recommended | Name | Default | Description | |---|---|---| | `ACCESS_TOKEN_MINUTES` | `30` | Short access TTL — keeps damage from a stolen token bounded | | `REFRESH_TOKEN_DAYS` | `7` | Refresh TTL — balance UX vs. risk | | `MAX_IMAGE_SIZE_MB` | `12` | Per-image upload limit | | `MAX_IMAGES_SYNC` | `5` | Sync mode cap | | `MAX_IMAGES_ASYNC` | `20` | Async mode cap | | `SENTRY_DSN` | (blank) | Enable Sentry error tracking | | `LOG_LEVEL` | `INFO` | `DEBUG` for troubleshooting; keep `INFO` in prod | #### ML service | Name | Default | Description | |---|---|---| | `ML_MODEL_DIR` | `/app/models` | Path to YOLO `.pt` weight files inside the container | | `ML_DEVICE` | `cpu` | `cuda` requires a GPU instance (Render does not offer GPU — keep CPU on Render and offload heavy ML to a separate GPU host or external service for production) | > **GPU note**: Render does not currently offer GPU instances. For the pilot, the backend runs YOLO on CPU — slower (~5–10× CPU vs. GPU). For production loads above ~50 inspections/hour, host the ML service separately on a GPU VPS (Hetzner, RunPod, etc.) and point `ML_SERVICE_URL` at it. Architecture diagram in [README.md](../README.md#architecture). ### 2c. Build & start commands If `render.yaml` is missing these, set them manually: **hasari-api** (web service): - Build: `pip install -r services/backend/requirements.txt` - Start: `cd services/backend && alembic upgrade head && uvicorn main:app --host 0.0.0.0 --port $PORT` **hasari-worker** (background worker): - Build: same - Start: `cd services/backend && celery -A worker worker --loglevel=info --concurrency=2` --- ## Step 3 — First deploy 1. Both services auto-deploy on push to `main`. Trigger the first deploy: - Go to **hasari-api** → **Manual Deploy** → **Deploy latest commit**. 2. Watch the build log. Expected duration: 8–15 minutes (downloads PyTorch, YOLO weights). 3. The first start runs `alembic upgrade head` — DB schema is created. 4. Once "Your service is live" appears, hit the health endpoint: ```bash curl https://hasari-api.onrender.com/health ``` Expected: ```json {"status":"ok","ml_loaded":true,"timestamp":"2026-05-15T...","version":"0.1.0"} ``` If `ml_loaded` is `false`, check the start log for "ML pipeline init failed" — usually means model weights are missing from the container. Run a one-off SSH session and run `python -c "from ml_service import ml_pipeline; print(ml_pipeline.is_loaded())"`. --- ## Step 4 — Create the admin user The first admin must be created out-of-band — there is no admin-registration UI. SSH into the API container via the Render dashboard (**Shell** tab) and run: ```bash cd services/backend python -c " from database import init_db from auth import _repo from security import hash_password init_db() user = _repo.create( email='admin@yourcompany.com', password_hash=hash_password('CHANGE_ME_now_strong_password'), full_name='Admin', ) # Promote import psycopg with psycopg.connect('$DATABASE_URL') as conn: conn.execute('UPDATE users SET role=%s WHERE id=%s', ('admin', user['id'])) conn.commit() print('Admin user created:', user['email']) " ``` Sign in immediately at `https://hasari.app/login` and rotate the password through the UI. --- ## Step 5 — Smoke test checklist Before announcing the deploy is "done", run through this list. Each item should pass on the first try. - [ ] `GET /health` returns 200 with `ml_loaded: true` - [ ] `GET /api/v1/version` returns expected git SHA and build time - [ ] `POST /auth/register` with a new email returns 201 + token pair - [ ] `POST /auth/login` with that email returns 200 + new token pair - [ ] `GET /auth/me` with the access token returns the user - [ ] `POST /auth/refresh` with the refresh token returns a new token pair - [ ] `POST /api/v1/inspect/sync` with a 1MB JPG returns 200 with parts/damages JSON within 15 seconds - [ ] `GET /api/v1/inspect` returns the inspection in the list - [ ] `GET /api/v1/inspect/{id}/visualization/annotated` redirects to a presigned S3 URL that returns a PNG - [ ] `DELETE /api/v1/inspect/{id}` removes it (subsequent GET returns 404) - [ ] Web app loads at custom domain, language defaults to TR - [ ] Sign in via web app, complete one inspection end-to-end - [ ] Open Render logs — no `ERROR` or `CRITICAL` entries in the past hour - [ ] Postgres connection count < 20 (visible in Render → hasari-db → Metrics) If any item fails, **do not** announce the launch. See [Troubleshooting](#troubleshooting). --- ## Monitoring & log access ### Logs - **Render dashboard**: hasari-api → **Logs** tab — live tail. - **CLI**: `render logs --service hasari-api --tail` (install [render-cli](https://render.com/docs/cli)). - **Structured JSON**: every log line is `{"time":..., "level":..., "logger":..., "msg":...}` — pipe to `jq` for filtering. ### Metrics - **Render built-in**: CPU, memory, response time, throughput visible per service in the dashboard. - **Prometheus**: scrape `https://hasari-api.onrender.com/metrics` (requires `Authorization: Bearer `). See `observability/` for a Grafana dashboard JSON to import. ### Alerts Configure in Render → service → **Notifications**: - **Deploy failed** → Slack/email - **Service crashed** → on-call rotation - **Disk usage > 80%** → Slack For app-level alerts (error rate > 1%, p95 latency > 3s), set up Sentry alerts on the `SENTRY_DSN` project. --- ## Rolling back a bad deploy If the latest deploy is broken: 1. Render dashboard → **hasari-api** → **Events** tab. 2. Find the last known-good deploy (green checkmark). 3. Click **Rollback to this deploy**. 4. Confirm. Render redeploys the previous Docker image — takes ~30 seconds. For database migrations that cannot be rolled back automatically: ```bash # In the Render shell: cd services/backend alembic downgrade -1 ``` **Important**: never `alembic downgrade` a migration that dropped a column with live data — you will lose data. Pre-launch, test every migration's `downgrade()` against a copy of production data. --- ## Cost estimate (monthly, pilot scale) | Item | Plan | Cost | |---|---|---| | Render web service (`hasari-api`) | Starter (512 MB) | $7 | | Render background worker (`hasari-worker`) | Starter (512 MB) | $7 | | Render Postgres | Starter | $7 | | Render Redis | Starter | $10 | | AWS S3 (10 GB storage, 100k req/month) | Pay-as-you-go | ~$1 | | AWS data transfer (out) | Pay-as-you-go | ~$2 | | Custom domain | (you own it) | $0 | | Sentry (free tier) | Developer | $0 | | **Total** | | **~$34/month** | Scaling beyond ~500 inspections/day will require: - Larger Render plans (Standard: $25/service) - Moving ML to a GPU VPS (Hetzner GPU: $80/month) - S3 storage growth: $0.023/GB/month --- ## Troubleshooting ### `ml_loaded: false` at startup **Cause**: model weights missing or wrong path. **Fix**: ensure `services/ml/yolo11m-seg.pt`, `yolo11s-seg.pt`, `yolo11n-cls.pt` are committed to the repo or downloaded in the build step. Check `ML_MODEL_DIR` env var. ### 503 "Is kuyrugu su an kullanilamiyor" **Cause**: Celery worker can't reach Redis. **Fix**: confirm `REDIS_URL` is set on **hasari-worker** (not just api). Restart worker. ### Postgres connection limit exceeded **Cause**: too many open connections — usually a long-running query or leaked sessions. **Fix**: check Render → Postgres → Metrics → "Active connections". Restart API service to drop them. Add `pool_pre_ping=True` and `pool_recycle=300` to SQLAlchemy engine config. ### S3 403 on upload **Cause**: IAM policy doesn't grant `s3:PutObject`, or bucket name typo, or wrong region. **Fix**: run `aws s3 ls s3://your-bucket-name --region eu-central-1` from anywhere with the same credentials to verify. ### Web app CORS error **Cause**: `CORS_ORIGINS` doesn't include the actual origin (don't forget `https://`, no trailing slash). **Fix**: update env var, redeploy API. ### Health check passing but inspections always fail with 500 **Cause**: usually an unhandled exception in the ML pipeline (corrupt model, OOM, missing class). **Fix**: enable `LOG_LEVEL=DEBUG`, reproduce, read the traceback in logs. If it's OOM, upgrade to Standard plan or move ML off-box. ### Migration locked / `alembic upgrade head` hangs **Cause**: a previous migration left a lock in the `alembic_version` table. **Fix**: in psql, `DELETE FROM alembic_locks;` (table name varies — check your alembic config) or set `LOCK_TIMEOUT` and retry. --- ## Post-launch monitoring (first 48 hours) Watch these metrics every 4 hours for the first two days: - **Error rate** (Sentry): target < 0.5% of requests - **API latency p95** (Render metrics): target < 1.5 s for non-inspection endpoints - **Inspection latency p95**: target < 12 s end-to-end for 4-photo batches - **Database active connections**: target < 50% of pool max - **Redis memory**: target < 80% of plan limit - **Failed inspection rate**: target < 2% of jobs reaching `failed` If any metric exceeds target for 30+ minutes, treat as a P1 incident. --- ## Related docs - [API_GUIDE.md](API_GUIDE.md) — REST contract for smoke tests - [AUTH_FLOW.md](AUTH_FLOW.md) — token lifecycle (env vars relevant) - [LAUNCH_CHECKLIST.md](LAUNCH_CHECKLIST.md) — pre-go-live sign-off gates - [OBSERVABILITY_SETUP.md](OBSERVABILITY_SETUP.md) — Prometheus + Grafana wiring