Spaces:

erdoganpeker
/

hasari-api

Running

App Files Files Community

hasari-api / docs /DEPLOY_GUIDE.md

erdoganpeker

v0.3.0 — multimodal vehicle damage MVP

e327f0d 5 days ago

preview code

raw

history blame contribute delete

14.6 kB

Deploy Guide — Render.com

End-to-end walkthrough for deploying Hasarİ to Render.com using render.yaml. Covers prerequisites, infra setup, environment configuration, the first deploy, smoke tests, monitoring, rollback, and cost.

Target audience: anyone with shell access and a Render account, no prior Render experience required.

Prerequisites

Before you start, you need:

Item	Why	How to get it
Render account	Hosts the API + worker	render.com — free tier ok for the web service; Postgres/Redis are paid
GitHub access to the repo	Render builds from git	`arac-hasar-v2` repo permissions
AWS S3 bucket (or compatible)	Image storage (uploads + visualizations)	Or use Cloudflare R2 / Backblaze B2 — anything S3-compatible
AWS IAM access key with `s3:GetObject`, `s3:PutObject`, `s3:DeleteObject` on the bucket	Backend uploads/signs URLs	AWS console → IAM
Custom domain (optional)	Branded URL	Any registrar, point CNAME at Render
A strong `JWT_SECRET_KEY`	Signs auth tokens	`openssl rand -base64 48`
Sentry DSN (optional)	Error tracking	sentry.io

Time estimate: 45–90 minutes for the first deploy. Subsequent deploys are git-push fast.

Step 1 — Provision infrastructure

The render.yaml at the repo root declares all services. Render reads it on first connect and creates everything in one go.

1a. Create the Postgres database

In the Render dashboard:

New + → PostgreSQL
Name: hasari-db
Database: hasari
User: hasari
Region: pick the one nearest your users (Frankfurt for EU/TR, Oregon for US)
Plan: Starter ($7/month) is sufficient for the pilot. Free tier expires after 90 days — do not use it for production.
Create database.

Copy the Internal Database URL (format: postgres://hasari:…@…/hasari). You'll paste it as DATABASE_URL in step 2.

1b. Create the Redis instance

New + → Redis
Name: hasari-redis
Region: same as Postgres
Plan: Starter ($10/month). Free tier has no persistence; do not use for production.
Maxmemory policy: allkeys-lru
Create Redis.

Copy the Internal Redis URL — you'll use it as REDIS_URL.

1c. Create the S3 bucket

In the AWS console:

S3 → Create bucket → hasari-uploads-prod (or your name), region matching the API for low latency
Block all public access — yes, keep all four boxes checked. The backend serves presigned URLs; no public listing.
Versioning: disabled (uploads are immutable; no need for revisions)
Server-side encryption: SSE-S3 (default) is fine
After creation: Permissions → CORS → add:

[
  {
    "AllowedHeaders": ["*"],
    "AllowedMethods": ["GET", "PUT", "POST"],
    "AllowedOrigins": ["https://hasari.app", "https://hasari-api.onrender.com"],
    "ExposeHeaders": ["ETag", "x-amz-request-id"],
    "MaxAgeSeconds": 3000
  }
]

Replace the origins with your actual web app URL.

IAM → create an IAM user hasari-backend, attach an inline policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": "arn:aws:s3:::hasari-uploads-prod/*"
    },
    {
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::hasari-uploads-prod"
    }
  ]
}

Create an access key for this user — save both halves now, you cannot retrieve the secret again.

Step 2 — Configure the backend service

2a. Connect the repo

Render dashboard → New + → Blueprint
Connect your GitHub account, pick arac-hasar-v2
Render detects render.yaml and shows the services it will create:
- hasari-api — web service (FastAPI)
- hasari-worker — background worker (Celery)
Click Apply to create both.

2b. Set environment variables

For each of hasari-api and hasari-worker, go to Environment and add:

Required

Name	Example	Description	Security note
`ENVIRONMENT`	`production`	Disables dev auth fallback	Never set to `dev` here
`DATABASE_URL`	`postgres://…`	From step 1a	Internal URL only; no external traffic
`REDIS_URL`	`redis://…`	From step 1b	Internal URL only
`JWT_SECRET_KEY`	(32+ char random string)	Signs JWTs	Generate fresh: `openssl rand -base64 48`. Rotating invalidates all existing sessions.
`S3_BUCKET`	`hasari-uploads-prod`	Bucket name	—
`S3_REGION`	`eu-central-1`	AWS region	—
`S3_ACCESS_KEY`	`AKIA…`	From step 1c	Use a dedicated IAM user, not your root key
`S3_SECRET_KEY`	`…`	From step 1c	Mark this var as "secret" in Render UI
`S3_ENDPOINT_URL`	(blank for AWS)	Set only for R2/MinIO/B2	—
`CORS_ORIGINS`	`https://hasari.app,https://www.hasari.app`	Comma-separated allowed web origins	Never use `*` in production

Name	Default	Description
`ACCESS_TOKEN_MINUTES`	`30`	Short access TTL — keeps damage from a stolen token bounded
`REFRESH_TOKEN_DAYS`	`7`	Refresh TTL — balance UX vs. risk
`MAX_IMAGE_SIZE_MB`	`12`	Per-image upload limit
`MAX_IMAGES_SYNC`	`5`	Sync mode cap
`MAX_IMAGES_ASYNC`	`20`	Async mode cap
`SENTRY_DSN`	(blank)	Enable Sentry error tracking
`LOG_LEVEL`	`INFO`	`DEBUG` for troubleshooting; keep `INFO` in prod

ML service

Name	Default	Description
`ML_MODEL_DIR`	`/app/models`	Path to YOLO `.pt` weight files inside the container
`ML_DEVICE`	`cpu`	`cuda` requires a GPU instance (Render does not offer GPU — keep CPU on Render and offload heavy ML to a separate GPU host or external service for production)

GPU note: Render does not currently offer GPU instances. For the pilot, the backend runs YOLO on CPU — slower (~5–10× CPU vs. GPU). For production loads above ~50 inspections/hour, host the ML service separately on a GPU VPS (Hetzner, RunPod, etc.) and point ML_SERVICE_URL at it. Architecture diagram in README.md.

2c. Build & start commands

If render.yaml is missing these, set them manually:

hasari-api (web service):

Build: pip install -r services/backend/requirements.txt
Start: cd services/backend && alembic upgrade head && uvicorn main:app --host 0.0.0.0 --port $PORT

hasari-worker (background worker):

Build: same
Start: cd services/backend && celery -A worker worker --loglevel=info --concurrency=2

Step 3 — First deploy

Both services auto-deploy on push to main. Trigger the first deploy:
- Go to hasari-api → Manual Deploy → Deploy latest commit.
Watch the build log. Expected duration: 8–15 minutes (downloads PyTorch, YOLO weights).
The first start runs alembic upgrade head — DB schema is created.
Once "Your service is live" appears, hit the health endpoint:

curl https://hasari-api.onrender.com/health

Expected:

{"status":"ok","ml_loaded":true,"timestamp":"2026-05-15T...","version":"0.1.0"}

If ml_loaded is false, check the start log for "ML pipeline init failed" — usually means model weights are missing from the container. Run a one-off SSH session and run python -c "from ml_service import ml_pipeline; print(ml_pipeline.is_loaded())".

Step 4 — Create the admin user

The first admin must be created out-of-band — there is no admin-registration UI. SSH into the API container via the Render dashboard (Shell tab) and run:

cd services/backend
python -c "
from database import init_db
from auth import _repo
from security import hash_password
init_db()
user = _repo.create(
    email='admin@yourcompany.com',
    password_hash=hash_password('CHANGE_ME_now_strong_password'),
    full_name='Admin',
)
# Promote
import psycopg
with psycopg.connect('$DATABASE_URL') as conn:
    conn.execute('UPDATE users SET role=%s WHERE id=%s', ('admin', user['id']))
    conn.commit()
print('Admin user created:', user['email'])
"

Step 5 — Smoke test checklist

Before announcing the deploy is "done", run through this list. Each item should pass on the first try.

GET /health returns 200 with ml_loaded: true
GET /api/v1/version returns expected git SHA and build time
POST /auth/register with a new email returns 201 + token pair
POST /auth/login with that email returns 200 + new token pair
GET /auth/me with the access token returns the user
POST /auth/refresh with the refresh token returns a new token pair
POST /api/v1/inspect/sync with a 1MB JPG returns 200 with parts/damages JSON within 15 seconds
GET /api/v1/inspect returns the inspection in the list
GET /api/v1/inspect/{id}/visualization/annotated redirects to a presigned S3 URL that returns a PNG
DELETE /api/v1/inspect/{id} removes it (subsequent GET returns 404)
Web app loads at custom domain, language defaults to TR
Sign in via web app, complete one inspection end-to-end
Open Render logs — no ERROR or CRITICAL entries in the past hour
Postgres connection count < 20 (visible in Render → hasari-db → Metrics)

If any item fails, do not announce the launch. See Troubleshooting.

Monitoring & log access

Logs

Render dashboard: hasari-api → Logs tab — live tail.
CLI: render logs --service hasari-api --tail (install render-cli).
Structured JSON: every log line is {"time":..., "level":..., "logger":..., "msg":...} — pipe to jq for filtering.

Metrics

Render built-in: CPU, memory, response time, throughput visible per service in the dashboard.
Prometheus: scrape https://hasari-api.onrender.com/metrics (requires Authorization: Bearer <admin token>). See observability/ for a Grafana dashboard JSON to import.

Alerts

Configure in Render → service → Notifications:

Deploy failed → Slack/email
Service crashed → on-call rotation
Disk usage > 80% → Slack

For app-level alerts (error rate > 1%, p95 latency > 3s), set up Sentry alerts on the SENTRY_DSN project.

Rolling back a bad deploy

If the latest deploy is broken:

Render dashboard → hasari-api → Events tab.
Find the last known-good deploy (green checkmark).
Click Rollback to this deploy.
Confirm. Render redeploys the previous Docker image — takes ~30 seconds.

For database migrations that cannot be rolled back automatically:

# In the Render shell:
cd services/backend
alembic downgrade -1

Important: never alembic downgrade a migration that dropped a column with live data — you will lose data. Pre-launch, test every migration's downgrade() against a copy of production data.

Cost estimate (monthly, pilot scale)

Item	Plan	Cost
Render web service (`hasari-api`)	Starter (512 MB)	$7
Render background worker (`hasari-worker`)	Starter (512 MB)	$7
Render Postgres	Starter	$7
Render Redis	Starter	$10
AWS S3 (10 GB storage, 100k req/month)	Pay-as-you-go	~$1
AWS data transfer (out)	Pay-as-you-go	~$2
Custom domain	(you own it)	$0
Sentry (free tier)	Developer	$0
Total		~$34/month

Scaling beyond ~500 inspections/day will require:

Larger Render plans (Standard: $25/service)
Moving ML to a GPU VPS (Hetzner GPU: $80/month)
S3 storage growth: $0.023/GB/month

Troubleshooting

`ml_loaded: false` at startup

Cause: model weights missing or wrong path. Fix: ensure services/ml/yolo11m-seg.pt, yolo11s-seg.pt, yolo11n-cls.pt are committed to the repo or downloaded in the build step. Check ML_MODEL_DIR env var.

503 "Is kuyrugu su an kullanilamiyor"

Cause: Celery worker can't reach Redis. Fix: confirm REDIS_URL is set on hasari-worker (not just api). Restart worker.

Postgres connection limit exceeded

Cause: too many open connections — usually a long-running query or leaked sessions. Fix: check Render → Postgres → Metrics → "Active connections". Restart API service to drop them. Add pool_pre_ping=True and pool_recycle=300 to SQLAlchemy engine config.

S3 403 on upload

Cause: IAM policy doesn't grant s3:PutObject, or bucket name typo, or wrong region. Fix: run aws s3 ls s3://your-bucket-name --region eu-central-1 from anywhere with the same credentials to verify.

Web app CORS error

Cause: CORS_ORIGINS doesn't include the actual origin (don't forget https://, no trailing slash). Fix: update env var, redeploy API.

Health check passing but inspections always fail with 500

Cause: usually an unhandled exception in the ML pipeline (corrupt model, OOM, missing class). Fix: enable LOG_LEVEL=DEBUG, reproduce, read the traceback in logs. If it's OOM, upgrade to Standard plan or move ML off-box.

Migration locked / `alembic upgrade head` hangs

Cause: a previous migration left a lock in the alembic_version table. Fix: in psql, DELETE FROM alembic_locks; (table name varies — check your alembic config) or set LOCK_TIMEOUT and retry.

Post-launch monitoring (first 48 hours)

Watch these metrics every 4 hours for the first two days:

Error rate (Sentry): target < 0.5% of requests
API latency p95 (Render metrics): target < 1.5 s for non-inspection endpoints
Inspection latency p95: target < 12 s end-to-end for 4-photo batches
Database active connections: target < 50% of pool max
Redis memory: target < 80% of plan limit
Failed inspection rate: target < 2% of jobs reaching failed

If any metric exceeds target for 30+ minutes, treat as a P1 incident.

Related docs

API_GUIDE.md — REST contract for smoke tests
AUTH_FLOW.md — token lifecycle (env vars relevant)
LAUNCH_CHECKLIST.md — pre-go-live sign-off gates
OBSERVABILITY_SETUP.md — Prometheus + Grafana wiring