Spaces:
Running
Running
| # Deploy Guide β Render.com | |
| End-to-end walkthrough for deploying HasarΔ° to Render.com using `render.yaml`. Covers prerequisites, infra setup, environment configuration, the first deploy, smoke tests, monitoring, rollback, and cost. | |
| > Target audience: anyone with shell access and a Render account, no prior Render experience required. | |
| --- | |
| ## Prerequisites | |
| Before you start, you need: | |
| | Item | Why | How to get it | | |
| |---|---|---| | |
| | **Render account** | Hosts the API + worker | [render.com](https://render.com) β free tier ok for the web service; Postgres/Redis are paid | | |
| | **GitHub access to the repo** | Render builds from git | `arac-hasar-v2` repo permissions | | |
| | **AWS S3 bucket** (or compatible) | Image storage (uploads + visualizations) | Or use Cloudflare R2 / Backblaze B2 β anything S3-compatible | | |
| | **AWS IAM access key** with `s3:GetObject`, `s3:PutObject`, `s3:DeleteObject` on the bucket | Backend uploads/signs URLs | AWS console β IAM | | |
| | **Custom domain** (optional) | Branded URL | Any registrar, point CNAME at Render | | |
| | **A strong `JWT_SECRET_KEY`** | Signs auth tokens | `openssl rand -base64 48` | | |
| | **Sentry DSN** (optional) | Error tracking | [sentry.io](https://sentry.io) | | |
| **Time estimate**: 45β90 minutes for the first deploy. Subsequent deploys are git-push fast. | |
| --- | |
| ## Step 1 β Provision infrastructure | |
| The `render.yaml` at the repo root declares all services. Render reads it on first connect and creates everything in one go. | |
| ### 1a. Create the Postgres database | |
| In the Render dashboard: | |
| 1. **New +** β **PostgreSQL** | |
| 2. **Name**: `hasari-db` | |
| 3. **Database**: `hasari` | |
| 4. **User**: `hasari` | |
| 5. **Region**: pick the one nearest your users (Frankfurt for EU/TR, Oregon for US) | |
| 6. **Plan**: **Starter** ($7/month) is sufficient for the pilot. Free tier expires after 90 days β do not use it for production. | |
| 7. **Create database**. | |
| Copy the **Internal Database URL** (format: `postgres://hasari:β¦@β¦/hasari`). You'll paste it as `DATABASE_URL` in step 2. | |
| ### 1b. Create the Redis instance | |
| 1. **New +** β **Redis** | |
| 2. **Name**: `hasari-redis` | |
| 3. **Region**: same as Postgres | |
| 4. **Plan**: **Starter** ($10/month). Free tier has no persistence; do not use for production. | |
| 5. **Maxmemory policy**: `allkeys-lru` | |
| 6. **Create Redis**. | |
| Copy the **Internal Redis URL** β you'll use it as `REDIS_URL`. | |
| ### 1c. Create the S3 bucket | |
| In the AWS console: | |
| 1. S3 β **Create bucket** β `hasari-uploads-prod` (or your name), region matching the API for low latency | |
| 2. **Block all public access** β yes, keep all four boxes checked. The backend serves presigned URLs; no public listing. | |
| 3. **Versioning**: disabled (uploads are immutable; no need for revisions) | |
| 4. **Server-side encryption**: SSE-S3 (default) is fine | |
| 5. After creation: **Permissions** β **CORS** β add: | |
| ```json | |
| [ | |
| { | |
| "AllowedHeaders": ["*"], | |
| "AllowedMethods": ["GET", "PUT", "POST"], | |
| "AllowedOrigins": ["https://hasari.app", "https://hasari-api.onrender.com"], | |
| "ExposeHeaders": ["ETag", "x-amz-request-id"], | |
| "MaxAgeSeconds": 3000 | |
| } | |
| ] | |
| ``` | |
| Replace the origins with your actual web app URL. | |
| 6. IAM β create an IAM user `hasari-backend`, attach an inline policy: | |
| ```json | |
| { | |
| "Version": "2012-10-17", | |
| "Statement": [ | |
| { | |
| "Effect": "Allow", | |
| "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"], | |
| "Resource": "arn:aws:s3:::hasari-uploads-prod/*" | |
| }, | |
| { | |
| "Effect": "Allow", | |
| "Action": "s3:ListBucket", | |
| "Resource": "arn:aws:s3:::hasari-uploads-prod" | |
| } | |
| ] | |
| } | |
| ``` | |
| Create an access key for this user β save both halves now, you cannot retrieve the secret again. | |
| --- | |
| ## Step 2 β Configure the backend service | |
| ### 2a. Connect the repo | |
| 1. Render dashboard β **New +** β **Blueprint** | |
| 2. Connect your GitHub account, pick `arac-hasar-v2` | |
| 3. Render detects `render.yaml` and shows the services it will create: | |
| - `hasari-api` β web service (FastAPI) | |
| - `hasari-worker` β background worker (Celery) | |
| 4. Click **Apply** to create both. | |
| ### 2b. Set environment variables | |
| For **each** of `hasari-api` and `hasari-worker`, go to **Environment** and add: | |
| #### Required | |
| | Name | Example | Description | Security note | | |
| |---|---|---|---| | |
| | `ENVIRONMENT` | `production` | Disables dev auth fallback | Never set to `dev` here | | |
| | `DATABASE_URL` | `postgres://β¦` | From step 1a | Internal URL only; no external traffic | | |
| | `REDIS_URL` | `redis://β¦` | From step 1b | Internal URL only | | |
| | `JWT_SECRET_KEY` | (32+ char random string) | Signs JWTs | Generate fresh: `openssl rand -base64 48`. Rotating invalidates all existing sessions. | | |
| | `S3_BUCKET` | `hasari-uploads-prod` | Bucket name | β | | |
| | `S3_REGION` | `eu-central-1` | AWS region | β | | |
| | `S3_ACCESS_KEY` | `AKIAβ¦` | From step 1c | Use a dedicated IAM user, not your root key | | |
| | `S3_SECRET_KEY` | `β¦` | From step 1c | Mark this var as "secret" in Render UI | | |
| | `S3_ENDPOINT_URL` | (blank for AWS) | Set only for R2/MinIO/B2 | β | | |
| | `CORS_ORIGINS` | `https://hasari.app,https://www.hasari.app` | Comma-separated allowed web origins | Never use `*` in production | | |
| #### Recommended | |
| | Name | Default | Description | | |
| |---|---|---| | |
| | `ACCESS_TOKEN_MINUTES` | `30` | Short access TTL β keeps damage from a stolen token bounded | | |
| | `REFRESH_TOKEN_DAYS` | `7` | Refresh TTL β balance UX vs. risk | | |
| | `MAX_IMAGE_SIZE_MB` | `12` | Per-image upload limit | | |
| | `MAX_IMAGES_SYNC` | `5` | Sync mode cap | | |
| | `MAX_IMAGES_ASYNC` | `20` | Async mode cap | | |
| | `SENTRY_DSN` | (blank) | Enable Sentry error tracking | | |
| | `LOG_LEVEL` | `INFO` | `DEBUG` for troubleshooting; keep `INFO` in prod | | |
| #### ML service | |
| | Name | Default | Description | | |
| |---|---|---| | |
| | `ML_MODEL_DIR` | `/app/models` | Path to YOLO `.pt` weight files inside the container | | |
| | `ML_DEVICE` | `cpu` | `cuda` requires a GPU instance (Render does not offer GPU β keep CPU on Render and offload heavy ML to a separate GPU host or external service for production) | | |
| > **GPU note**: Render does not currently offer GPU instances. For the pilot, the backend runs YOLO on CPU β slower (~5β10Γ CPU vs. GPU). For production loads above ~50 inspections/hour, host the ML service separately on a GPU VPS (Hetzner, RunPod, etc.) and point `ML_SERVICE_URL` at it. Architecture diagram in [README.md](../README.md#architecture). | |
| ### 2c. Build & start commands | |
| If `render.yaml` is missing these, set them manually: | |
| **hasari-api** (web service): | |
| - Build: `pip install -r services/backend/requirements.txt` | |
| - Start: `cd services/backend && alembic upgrade head && uvicorn main:app --host 0.0.0.0 --port $PORT` | |
| **hasari-worker** (background worker): | |
| - Build: same | |
| - Start: `cd services/backend && celery -A worker worker --loglevel=info --concurrency=2` | |
| --- | |
| ## Step 3 β First deploy | |
| 1. Both services auto-deploy on push to `main`. Trigger the first deploy: | |
| - Go to **hasari-api** β **Manual Deploy** β **Deploy latest commit**. | |
| 2. Watch the build log. Expected duration: 8β15 minutes (downloads PyTorch, YOLO weights). | |
| 3. The first start runs `alembic upgrade head` β DB schema is created. | |
| 4. Once "Your service is live" appears, hit the health endpoint: | |
| ```bash | |
| curl https://hasari-api.onrender.com/health | |
| ``` | |
| Expected: | |
| ```json | |
| {"status":"ok","ml_loaded":true,"timestamp":"2026-05-15T...","version":"0.1.0"} | |
| ``` | |
| If `ml_loaded` is `false`, check the start log for "ML pipeline init failed" β usually means model weights are missing from the container. Run a one-off SSH session and run `python -c "from ml_service import ml_pipeline; print(ml_pipeline.is_loaded())"`. | |
| --- | |
| ## Step 4 β Create the admin user | |
| The first admin must be created out-of-band β there is no admin-registration UI. SSH into the API container via the Render dashboard (**Shell** tab) and run: | |
| ```bash | |
| cd services/backend | |
| python -c " | |
| from database import init_db | |
| from auth import _repo | |
| from security import hash_password | |
| init_db() | |
| user = _repo.create( | |
| email='admin@yourcompany.com', | |
| password_hash=hash_password('CHANGE_ME_now_strong_password'), | |
| full_name='Admin', | |
| ) | |
| # Promote | |
| import psycopg | |
| with psycopg.connect('$DATABASE_URL') as conn: | |
| conn.execute('UPDATE users SET role=%s WHERE id=%s', ('admin', user['id'])) | |
| conn.commit() | |
| print('Admin user created:', user['email']) | |
| " | |
| ``` | |
| Sign in immediately at `https://hasari.app/login` and rotate the password through the UI. | |
| --- | |
| ## Step 5 β Smoke test checklist | |
| Before announcing the deploy is "done", run through this list. Each item should pass on the first try. | |
| - [ ] `GET /health` returns 200 with `ml_loaded: true` | |
| - [ ] `GET /api/v1/version` returns expected git SHA and build time | |
| - [ ] `POST /auth/register` with a new email returns 201 + token pair | |
| - [ ] `POST /auth/login` with that email returns 200 + new token pair | |
| - [ ] `GET /auth/me` with the access token returns the user | |
| - [ ] `POST /auth/refresh` with the refresh token returns a new token pair | |
| - [ ] `POST /api/v1/inspect/sync` with a 1MB JPG returns 200 with parts/damages JSON within 15 seconds | |
| - [ ] `GET /api/v1/inspect` returns the inspection in the list | |
| - [ ] `GET /api/v1/inspect/{id}/visualization/annotated` redirects to a presigned S3 URL that returns a PNG | |
| - [ ] `DELETE /api/v1/inspect/{id}` removes it (subsequent GET returns 404) | |
| - [ ] Web app loads at custom domain, language defaults to TR | |
| - [ ] Sign in via web app, complete one inspection end-to-end | |
| - [ ] Open Render logs β no `ERROR` or `CRITICAL` entries in the past hour | |
| - [ ] Postgres connection count < 20 (visible in Render β hasari-db β Metrics) | |
| If any item fails, **do not** announce the launch. See [Troubleshooting](#troubleshooting). | |
| --- | |
| ## Monitoring & log access | |
| ### Logs | |
| - **Render dashboard**: hasari-api β **Logs** tab β live tail. | |
| - **CLI**: `render logs --service hasari-api --tail` (install [render-cli](https://render.com/docs/cli)). | |
| - **Structured JSON**: every log line is `{"time":..., "level":..., "logger":..., "msg":...}` β pipe to `jq` for filtering. | |
| ### Metrics | |
| - **Render built-in**: CPU, memory, response time, throughput visible per service in the dashboard. | |
| - **Prometheus**: scrape `https://hasari-api.onrender.com/metrics` (requires `Authorization: Bearer <admin token>`). See `observability/` for a Grafana dashboard JSON to import. | |
| ### Alerts | |
| Configure in Render β service β **Notifications**: | |
| - **Deploy failed** β Slack/email | |
| - **Service crashed** β on-call rotation | |
| - **Disk usage > 80%** β Slack | |
| For app-level alerts (error rate > 1%, p95 latency > 3s), set up Sentry alerts on the `SENTRY_DSN` project. | |
| --- | |
| ## Rolling back a bad deploy | |
| If the latest deploy is broken: | |
| 1. Render dashboard β **hasari-api** β **Events** tab. | |
| 2. Find the last known-good deploy (green checkmark). | |
| 3. Click **Rollback to this deploy**. | |
| 4. Confirm. Render redeploys the previous Docker image β takes ~30 seconds. | |
| For database migrations that cannot be rolled back automatically: | |
| ```bash | |
| # In the Render shell: | |
| cd services/backend | |
| alembic downgrade -1 | |
| ``` | |
| **Important**: never `alembic downgrade` a migration that dropped a column with live data β you will lose data. Pre-launch, test every migration's `downgrade()` against a copy of production data. | |
| --- | |
| ## Cost estimate (monthly, pilot scale) | |
| | Item | Plan | Cost | | |
| |---|---|---| | |
| | Render web service (`hasari-api`) | Starter (512 MB) | $7 | | |
| | Render background worker (`hasari-worker`) | Starter (512 MB) | $7 | | |
| | Render Postgres | Starter | $7 | | |
| | Render Redis | Starter | $10 | | |
| | AWS S3 (10 GB storage, 100k req/month) | Pay-as-you-go | ~$1 | | |
| | AWS data transfer (out) | Pay-as-you-go | ~$2 | | |
| | Custom domain | (you own it) | $0 | | |
| | Sentry (free tier) | Developer | $0 | | |
| | **Total** | | **~$34/month** | | |
| Scaling beyond ~500 inspections/day will require: | |
| - Larger Render plans (Standard: $25/service) | |
| - Moving ML to a GPU VPS (Hetzner GPU: $80/month) | |
| - S3 storage growth: $0.023/GB/month | |
| --- | |
| ## Troubleshooting | |
| ### `ml_loaded: false` at startup | |
| **Cause**: model weights missing or wrong path. | |
| **Fix**: ensure `services/ml/yolo11m-seg.pt`, `yolo11s-seg.pt`, `yolo11n-cls.pt` are committed to the repo or downloaded in the build step. Check `ML_MODEL_DIR` env var. | |
| ### 503 "Is kuyrugu su an kullanilamiyor" | |
| **Cause**: Celery worker can't reach Redis. | |
| **Fix**: confirm `REDIS_URL` is set on **hasari-worker** (not just api). Restart worker. | |
| ### Postgres connection limit exceeded | |
| **Cause**: too many open connections β usually a long-running query or leaked sessions. | |
| **Fix**: check Render β Postgres β Metrics β "Active connections". Restart API service to drop them. Add `pool_pre_ping=True` and `pool_recycle=300` to SQLAlchemy engine config. | |
| ### S3 403 on upload | |
| **Cause**: IAM policy doesn't grant `s3:PutObject`, or bucket name typo, or wrong region. | |
| **Fix**: run `aws s3 ls s3://your-bucket-name --region eu-central-1` from anywhere with the same credentials to verify. | |
| ### Web app CORS error | |
| **Cause**: `CORS_ORIGINS` doesn't include the actual origin (don't forget `https://`, no trailing slash). | |
| **Fix**: update env var, redeploy API. | |
| ### Health check passing but inspections always fail with 500 | |
| **Cause**: usually an unhandled exception in the ML pipeline (corrupt model, OOM, missing class). | |
| **Fix**: enable `LOG_LEVEL=DEBUG`, reproduce, read the traceback in logs. If it's OOM, upgrade to Standard plan or move ML off-box. | |
| ### Migration locked / `alembic upgrade head` hangs | |
| **Cause**: a previous migration left a lock in the `alembic_version` table. | |
| **Fix**: in psql, `DELETE FROM alembic_locks;` (table name varies β check your alembic config) or set `LOCK_TIMEOUT` and retry. | |
| --- | |
| ## Post-launch monitoring (first 48 hours) | |
| Watch these metrics every 4 hours for the first two days: | |
| - **Error rate** (Sentry): target < 0.5% of requests | |
| - **API latency p95** (Render metrics): target < 1.5 s for non-inspection endpoints | |
| - **Inspection latency p95**: target < 12 s end-to-end for 4-photo batches | |
| - **Database active connections**: target < 50% of pool max | |
| - **Redis memory**: target < 80% of plan limit | |
| - **Failed inspection rate**: target < 2% of jobs reaching `failed` | |
| If any metric exceeds target for 30+ minutes, treat as a P1 incident. | |
| --- | |
| ## Related docs | |
| - [API_GUIDE.md](API_GUIDE.md) β REST contract for smoke tests | |
| - [AUTH_FLOW.md](AUTH_FLOW.md) β token lifecycle (env vars relevant) | |
| - [LAUNCH_CHECKLIST.md](LAUNCH_CHECKLIST.md) β pre-go-live sign-off gates | |
| - [OBSERVABILITY_SETUP.md](OBSERVABILITY_SETUP.md) β Prometheus + Grafana wiring | |