hasari-api / docs /DEPLOY_GUIDE.md
erdoganpeker's picture
v0.3.0 β€” multimodal vehicle damage MVP
e327f0d

Deploy Guide β€” Render.com

End-to-end walkthrough for deploying HasarΔ° to Render.com using render.yaml. Covers prerequisites, infra setup, environment configuration, the first deploy, smoke tests, monitoring, rollback, and cost.

Target audience: anyone with shell access and a Render account, no prior Render experience required.


Prerequisites

Before you start, you need:

Item Why How to get it
Render account Hosts the API + worker render.com β€” free tier ok for the web service; Postgres/Redis are paid
GitHub access to the repo Render builds from git arac-hasar-v2 repo permissions
AWS S3 bucket (or compatible) Image storage (uploads + visualizations) Or use Cloudflare R2 / Backblaze B2 β€” anything S3-compatible
AWS IAM access key with s3:GetObject, s3:PutObject, s3:DeleteObject on the bucket Backend uploads/signs URLs AWS console β†’ IAM
Custom domain (optional) Branded URL Any registrar, point CNAME at Render
A strong JWT_SECRET_KEY Signs auth tokens openssl rand -base64 48
Sentry DSN (optional) Error tracking sentry.io

Time estimate: 45–90 minutes for the first deploy. Subsequent deploys are git-push fast.


Step 1 β€” Provision infrastructure

The render.yaml at the repo root declares all services. Render reads it on first connect and creates everything in one go.

1a. Create the Postgres database

In the Render dashboard:

  1. New + β†’ PostgreSQL
  2. Name: hasari-db
  3. Database: hasari
  4. User: hasari
  5. Region: pick the one nearest your users (Frankfurt for EU/TR, Oregon for US)
  6. Plan: Starter ($7/month) is sufficient for the pilot. Free tier expires after 90 days β€” do not use it for production.
  7. Create database.

Copy the Internal Database URL (format: postgres://hasari:…@…/hasari). You'll paste it as DATABASE_URL in step 2.

1b. Create the Redis instance

  1. New + β†’ Redis
  2. Name: hasari-redis
  3. Region: same as Postgres
  4. Plan: Starter ($10/month). Free tier has no persistence; do not use for production.
  5. Maxmemory policy: allkeys-lru
  6. Create Redis.

Copy the Internal Redis URL β€” you'll use it as REDIS_URL.

1c. Create the S3 bucket

In the AWS console:

  1. S3 β†’ Create bucket β†’ hasari-uploads-prod (or your name), region matching the API for low latency
  2. Block all public access β€” yes, keep all four boxes checked. The backend serves presigned URLs; no public listing.
  3. Versioning: disabled (uploads are immutable; no need for revisions)
  4. Server-side encryption: SSE-S3 (default) is fine
  5. After creation: Permissions β†’ CORS β†’ add:
[
  {
    "AllowedHeaders": ["*"],
    "AllowedMethods": ["GET", "PUT", "POST"],
    "AllowedOrigins": ["https://hasari.app", "https://hasari-api.onrender.com"],
    "ExposeHeaders": ["ETag", "x-amz-request-id"],
    "MaxAgeSeconds": 3000
  }
]

Replace the origins with your actual web app URL.

  1. IAM β†’ create an IAM user hasari-backend, attach an inline policy:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": "arn:aws:s3:::hasari-uploads-prod/*"
    },
    {
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::hasari-uploads-prod"
    }
  ]
}

Create an access key for this user β€” save both halves now, you cannot retrieve the secret again.


Step 2 β€” Configure the backend service

2a. Connect the repo

  1. Render dashboard β†’ New + β†’ Blueprint
  2. Connect your GitHub account, pick arac-hasar-v2
  3. Render detects render.yaml and shows the services it will create:
    • hasari-api β€” web service (FastAPI)
    • hasari-worker β€” background worker (Celery)
  4. Click Apply to create both.

2b. Set environment variables

For each of hasari-api and hasari-worker, go to Environment and add:

Required

Name Example Description Security note
ENVIRONMENT production Disables dev auth fallback Never set to dev here
DATABASE_URL postgres://… From step 1a Internal URL only; no external traffic
REDIS_URL redis://… From step 1b Internal URL only
JWT_SECRET_KEY (32+ char random string) Signs JWTs Generate fresh: openssl rand -base64 48. Rotating invalidates all existing sessions.
S3_BUCKET hasari-uploads-prod Bucket name β€”
S3_REGION eu-central-1 AWS region β€”
S3_ACCESS_KEY AKIA… From step 1c Use a dedicated IAM user, not your root key
S3_SECRET_KEY … From step 1c Mark this var as "secret" in Render UI
S3_ENDPOINT_URL (blank for AWS) Set only for R2/MinIO/B2 β€”
CORS_ORIGINS https://hasari.app,https://www.hasari.app Comma-separated allowed web origins Never use * in production

Recommended

Name Default Description
ACCESS_TOKEN_MINUTES 30 Short access TTL β€” keeps damage from a stolen token bounded
REFRESH_TOKEN_DAYS 7 Refresh TTL β€” balance UX vs. risk
MAX_IMAGE_SIZE_MB 12 Per-image upload limit
MAX_IMAGES_SYNC 5 Sync mode cap
MAX_IMAGES_ASYNC 20 Async mode cap
SENTRY_DSN (blank) Enable Sentry error tracking
LOG_LEVEL INFO DEBUG for troubleshooting; keep INFO in prod

ML service

Name Default Description
ML_MODEL_DIR /app/models Path to YOLO .pt weight files inside the container
ML_DEVICE cpu cuda requires a GPU instance (Render does not offer GPU β€” keep CPU on Render and offload heavy ML to a separate GPU host or external service for production)

GPU note: Render does not currently offer GPU instances. For the pilot, the backend runs YOLO on CPU β€” slower (~5–10Γ— CPU vs. GPU). For production loads above ~50 inspections/hour, host the ML service separately on a GPU VPS (Hetzner, RunPod, etc.) and point ML_SERVICE_URL at it. Architecture diagram in README.md.

2c. Build & start commands

If render.yaml is missing these, set them manually:

hasari-api (web service):

  • Build: pip install -r services/backend/requirements.txt
  • Start: cd services/backend && alembic upgrade head && uvicorn main:app --host 0.0.0.0 --port $PORT

hasari-worker (background worker):

  • Build: same
  • Start: cd services/backend && celery -A worker worker --loglevel=info --concurrency=2

Step 3 β€” First deploy

  1. Both services auto-deploy on push to main. Trigger the first deploy:
    • Go to hasari-api β†’ Manual Deploy β†’ Deploy latest commit.
  2. Watch the build log. Expected duration: 8–15 minutes (downloads PyTorch, YOLO weights).
  3. The first start runs alembic upgrade head β€” DB schema is created.
  4. Once "Your service is live" appears, hit the health endpoint:
curl https://hasari-api.onrender.com/health

Expected:

{"status":"ok","ml_loaded":true,"timestamp":"2026-05-15T...","version":"0.1.0"}

If ml_loaded is false, check the start log for "ML pipeline init failed" β€” usually means model weights are missing from the container. Run a one-off SSH session and run python -c "from ml_service import ml_pipeline; print(ml_pipeline.is_loaded())".


Step 4 β€” Create the admin user

The first admin must be created out-of-band β€” there is no admin-registration UI. SSH into the API container via the Render dashboard (Shell tab) and run:

cd services/backend
python -c "
from database import init_db
from auth import _repo
from security import hash_password
init_db()
user = _repo.create(
    email='admin@yourcompany.com',
    password_hash=hash_password('CHANGE_ME_now_strong_password'),
    full_name='Admin',
)
# Promote
import psycopg
with psycopg.connect('$DATABASE_URL') as conn:
    conn.execute('UPDATE users SET role=%s WHERE id=%s', ('admin', user['id']))
    conn.commit()
print('Admin user created:', user['email'])
"

Sign in immediately at https://hasari.app/login and rotate the password through the UI.


Step 5 β€” Smoke test checklist

Before announcing the deploy is "done", run through this list. Each item should pass on the first try.

  • GET /health returns 200 with ml_loaded: true
  • GET /api/v1/version returns expected git SHA and build time
  • POST /auth/register with a new email returns 201 + token pair
  • POST /auth/login with that email returns 200 + new token pair
  • GET /auth/me with the access token returns the user
  • POST /auth/refresh with the refresh token returns a new token pair
  • POST /api/v1/inspect/sync with a 1MB JPG returns 200 with parts/damages JSON within 15 seconds
  • GET /api/v1/inspect returns the inspection in the list
  • GET /api/v1/inspect/{id}/visualization/annotated redirects to a presigned S3 URL that returns a PNG
  • DELETE /api/v1/inspect/{id} removes it (subsequent GET returns 404)
  • Web app loads at custom domain, language defaults to TR
  • Sign in via web app, complete one inspection end-to-end
  • Open Render logs β€” no ERROR or CRITICAL entries in the past hour
  • Postgres connection count < 20 (visible in Render β†’ hasari-db β†’ Metrics)

If any item fails, do not announce the launch. See Troubleshooting.


Monitoring & log access

Logs

  • Render dashboard: hasari-api β†’ Logs tab β€” live tail.
  • CLI: render logs --service hasari-api --tail (install render-cli).
  • Structured JSON: every log line is {"time":..., "level":..., "logger":..., "msg":...} β€” pipe to jq for filtering.

Metrics

  • Render built-in: CPU, memory, response time, throughput visible per service in the dashboard.
  • Prometheus: scrape https://hasari-api.onrender.com/metrics (requires Authorization: Bearer <admin token>). See observability/ for a Grafana dashboard JSON to import.

Alerts

Configure in Render β†’ service β†’ Notifications:

  • Deploy failed β†’ Slack/email
  • Service crashed β†’ on-call rotation
  • Disk usage > 80% β†’ Slack

For app-level alerts (error rate > 1%, p95 latency > 3s), set up Sentry alerts on the SENTRY_DSN project.


Rolling back a bad deploy

If the latest deploy is broken:

  1. Render dashboard β†’ hasari-api β†’ Events tab.
  2. Find the last known-good deploy (green checkmark).
  3. Click Rollback to this deploy.
  4. Confirm. Render redeploys the previous Docker image β€” takes ~30 seconds.

For database migrations that cannot be rolled back automatically:

# In the Render shell:
cd services/backend
alembic downgrade -1

Important: never alembic downgrade a migration that dropped a column with live data β€” you will lose data. Pre-launch, test every migration's downgrade() against a copy of production data.


Cost estimate (monthly, pilot scale)

Item Plan Cost
Render web service (hasari-api) Starter (512 MB) $7
Render background worker (hasari-worker) Starter (512 MB) $7
Render Postgres Starter $7
Render Redis Starter $10
AWS S3 (10 GB storage, 100k req/month) Pay-as-you-go ~$1
AWS data transfer (out) Pay-as-you-go ~$2
Custom domain (you own it) $0
Sentry (free tier) Developer $0
Total ~$34/month

Scaling beyond ~500 inspections/day will require:

  • Larger Render plans (Standard: $25/service)
  • Moving ML to a GPU VPS (Hetzner GPU: $80/month)
  • S3 storage growth: $0.023/GB/month

Troubleshooting

ml_loaded: false at startup

Cause: model weights missing or wrong path. Fix: ensure services/ml/yolo11m-seg.pt, yolo11s-seg.pt, yolo11n-cls.pt are committed to the repo or downloaded in the build step. Check ML_MODEL_DIR env var.

503 "Is kuyrugu su an kullanilamiyor"

Cause: Celery worker can't reach Redis. Fix: confirm REDIS_URL is set on hasari-worker (not just api). Restart worker.

Postgres connection limit exceeded

Cause: too many open connections β€” usually a long-running query or leaked sessions. Fix: check Render β†’ Postgres β†’ Metrics β†’ "Active connections". Restart API service to drop them. Add pool_pre_ping=True and pool_recycle=300 to SQLAlchemy engine config.

S3 403 on upload

Cause: IAM policy doesn't grant s3:PutObject, or bucket name typo, or wrong region. Fix: run aws s3 ls s3://your-bucket-name --region eu-central-1 from anywhere with the same credentials to verify.

Web app CORS error

Cause: CORS_ORIGINS doesn't include the actual origin (don't forget https://, no trailing slash). Fix: update env var, redeploy API.

Health check passing but inspections always fail with 500

Cause: usually an unhandled exception in the ML pipeline (corrupt model, OOM, missing class). Fix: enable LOG_LEVEL=DEBUG, reproduce, read the traceback in logs. If it's OOM, upgrade to Standard plan or move ML off-box.

Migration locked / alembic upgrade head hangs

Cause: a previous migration left a lock in the alembic_version table. Fix: in psql, DELETE FROM alembic_locks; (table name varies β€” check your alembic config) or set LOCK_TIMEOUT and retry.


Post-launch monitoring (first 48 hours)

Watch these metrics every 4 hours for the first two days:

  • Error rate (Sentry): target < 0.5% of requests
  • API latency p95 (Render metrics): target < 1.5 s for non-inspection endpoints
  • Inspection latency p95: target < 12 s end-to-end for 4-photo batches
  • Database active connections: target < 50% of pool max
  • Redis memory: target < 80% of plan limit
  • Failed inspection rate: target < 2% of jobs reaching failed

If any metric exceeds target for 30+ minutes, treat as a P1 incident.


Related docs