hasari-api / docs /SECURITY.md
erdoganpeker's picture
v0.3.0 β€” multimodal vehicle damage MVP
e327f0d
# SECURITY β€” arac-hasar-v2
Owner: Security Engineer
Scope: pilot-production. Stores customer PII (vehicle images, user emails) and produces damage / cost estimates that may flow into invoice / claim workflows.
---
## 1. Threat Model
### 1.1 System overview
| Layer | Component | Notes |
|---|---|---|
| Edge | TLS terminator (Render / Cloudflare / nginx) | HTTPS only; no plaintext listener in prod |
| API | FastAPI (`services/backend`) | JWT-authenticated REST + WebSocket; this document covers it |
| ML | YOLO inference service (`services/ml`) | Internal; reachable only from backend |
| Storage | PostgreSQL (managed), Redis (rate-limit + pubsub), S3/MinIO (images) | Network-isolated; no public exposure |
| Clients | Next.js web, Tauri 2 desktop, React Native mobile | All consume the same API |
### 1.2 Trust boundaries
```
Public Internet
| (TLS)
[Edge / CDN]
| (private network)
[FastAPI]
| (private network, IAM)
[Postgres] [Redis] [S3] [ML service]
```
Every arrow crossing a boundary is an authentication checkpoint.
### 1.3 Sensitive data inventory
| Data | Classification | Where it lives | Controls |
|---|---|---|---|
| User email | PII | Postgres `users.email`, access logs (redacted on auth paths) | TLS in transit; encrypted-at-rest (managed Postgres) |
| Password | secret | Postgres `users.password_hash` (bcrypt cost 12) | Never logged; never returned |
| Vehicle images | PII (may contain plates, faces, location via EXIF) | S3 bucket | EXIF stripped on upload; private bucket; signed URLs only |
| JWT access/refresh | secret | Client-held; never persisted server-side | Short TTL (30 min / 7 d); HS256 signed |
| API keys (pilot integrations) | secret | Postgres `api_keys.key_hash` (sha256) | Shown plaintext once on issue; revocable |
| ML inference results / cost estimates | business data | Postgres + S3 reports | Tenant isolation enforced at handler |
### 1.4 STRIDE summary
| Threat | Vector | Risk | Mitigation |
|---|---|---|---|
| Spoofing | Stolen credentials, token replay | High | Bcrypt cost 12, short access-token TTL, refresh rotation (TODO: backend wire), per-route rate limits on `/auth/login` |
| Tampering | Modified upload, tampered cost estimate | Med | Server-side decode + revalidation of images; cost computed server-side from `cost_table.yaml`; never trust client-supplied totals |
| Repudiation | "I never uploaded that" / "I never approved that estimate" | Med | Structured JSON access log w/ request_id, user_id, sha256 of uploaded image |
| Information disclosure | IDOR on `/api/v1/inspect/{id}`, EXIF GPS leak | High | Mandatory ownership check pattern (section 3); EXIF stripped before storage |
| Denial of Service | Image bomb, hot loop on `/inspect`, brute-force login | High | 20 MB cap, decompression-bomb guard (`Image.MAX_IMAGE_PIXELS`), slowapi limits |
| Elevation of privilege | `role` claim tampering, missing admin check | Crit | JWT signature verification; `require_admin` dependency; role re-read from DB on refresh |
### 1.5 Out of scope (for now)
- Multi-region failover
- DDoS at the transport layer (delegated to CDN)
- Hardware security modules / KMS-managed JWT signing keys (flagged for production-scale)
- SSO / SAML (pilot uses local accounts + API keys)
---
## 2. OWASP Top 10 (2021) β€” Mitigations
### A01 β€” Broken Access Control
- Every protected route depends on `require_user` (or `require_admin`).
- IDOR pattern is mandatory; see section 3.
- WebSocket connections must authenticate within 5 s of `accept()` (Backend Architect owns the WS handler β€” flagged in section 6).
- Default policy is deny: a route without an explicit auth dependency is treated as a review failure.
### A02 β€” Cryptographic Failures
- Passwords: bcrypt (passlib), cost factor 12, `BCRYPT_ROUNDS` env-tunable.
- JWT: HS256 (acceptable for monolithic backend; migrate to RS256 if signing moves to a separate service).
- API keys: 256 bits of entropy, prefixed `ahv2_`, stored as sha256 hash, compared with `hmac.compare_digest`.
- Secrets exclusively via env vars; `.env` is gitignored.
- TLS terminated at edge; HSTS sent in staging/prod by `SecurityHeadersMiddleware`.
- No custom crypto. Period.
### A03 β€” Injection
- **SQL**: SQLAlchemy ORM + parameterized `text()` for any raw SQL. Never f-string user input into queries. Reviewed in PR template.
- **Command**: no `subprocess` with `shell=True`. Image processing stays in-process (PIL).
- **Path**: `sanitize_filename` strips `..`, backslashes, control chars, and prefixes a uuid4. S3 keys are never user-supplied raw.
- **Header**: request IDs whitelisted to `[A-Za-z0-9_-]`, capped at 128 chars (CRLF injection guard).
### A04 β€” Insecure Design
- Threat model (section 1) reviewed before each release.
- Cost estimates computed server-side from `cost_table.yaml`; clients cannot override.
- Refresh tokens carry `role="user"` by design β€” privilege is re-derived from the DB on refresh so a leaked refresh token cannot escalate.
### A05 β€” Security Misconfiguration
- `_validate_config()` hard-fails at import time if `JWT_SECRET_KEY` is < 32 chars in staging/production.
- Default-deny CSP (`default-src 'none'`) on all API responses.
- CORS allowlist (`ALLOWED_ORIGINS`); `allow_credentials=False` because we use bearer tokens, not cookies.
- `Server` header stripped.
- Debug / docs (`/docs`, `/redoc`) must be disabled in production (flagged for Backend Architect β€” see section 6).
### A06 β€” Vulnerable & Outdated Components
- `requirements.txt` is the canonical lock; CI must run `pip-audit` (or `trivy fs`) on every PR.
- Renovate / Dependabot recommended for weekly updates.
### A07 β€” Identification & Authentication Failures
- `/auth/login` rate-limited to **5/min per IP** via slowapi.
- Generic error messages on bad credentials ("invalid email or password") β€” no user enumeration.
- Access tokens: 30 min. Refresh tokens: 7 d, single-use rotation (Backend Architect to implement `jti` blocklist in Redis β€” section 6).
- Password requirements (length / complexity) are owned by the user model layer (Database Optimizer) β€” flagged.
### A08 β€” Software & Data Integrity Failures
- Pinned dependencies in `requirements.txt`.
- Container images built from pinned base + reproducible build.
- ML model weights checksummed at load time (Backend Architect owns `ml_service.py` β€” flagged in section 6).
### A09 β€” Security Logging & Monitoring Failures
- `AccessLogMiddleware` emits structured JSON: `ts, method, path, status, duration_ms, user_id, request_id, ip, ua`.
- Auth paths (`/auth/*`, `/login`, `/token`, `/refresh`, `/password`) suppress query string from logs.
- Request bodies are never logged.
- Bcrypt / JWT failures log at INFO with **reason class only**, never the input.
- Recommend shipping access log to a SIEM / log aggregator (Loki / CloudWatch) with retention >= 90 days.
### A10 β€” Server-Side Request Forgery (SSRF)
- The backend never fetches user-supplied URLs.
- Image uploads are received as multipart bytes β€” no fetch-by-URL path exists.
- If a "fetch from URL" feature is added later, it MUST:
1. Resolve DNS server-side once and reject private / link-local ranges.
2. Disallow redirects to private ranges.
3. Run in a dedicated egress-restricted network namespace.
---
## 3. Authorization pattern (mandatory)
Every endpoint that touches a tenant-scoped resource MUST follow this shape:
```python
from fastapi import APIRouter, Depends, HTTPException, status
from security import require_user, TokenPayload
router = APIRouter()
@router.get("/api/v1/inspect/{inspection_id}")
async def get_inspection(
inspection_id: UUID,
user: TokenPayload = Depends(require_user),
db: AsyncSession = Depends(get_db),
):
row = await db.get(Inspection, inspection_id)
if row is None:
# 404, not 403, to avoid leaking existence
raise HTTPException(status.HTTP_404_NOT_FOUND)
if row.user_id != user.user_id and user.role != "admin":
# IDOR check. Same 404 to prevent enumeration.
raise HTTPException(status.HTTP_404_NOT_FOUND)
return row
```
Rules:
1. **Always** check ownership before returning a row.
2. **Always** return `404`, never `403`, when the user isn't the owner (no existence oracle).
3. Admin override goes through `user.role == "admin"`, never a query param.
4. Bulk endpoints (e.g. `GET /api/v1/inspect`) MUST filter `WHERE user_id = :uid` in the query β€” never in Python.
---
## 4. File upload pipeline
```
multipart bytes
-> validate_image_upload(buf)
size cap (20 MB)
magic-byte MIME sniff (jpeg / png / webp only)
PIL decode + verify()
decompression-bomb guard
EXIF orientation applied
EXIF metadata stripped (PII: GPS, camera serial, timestamps)
dimension cap (10000 x 10000)
-> sanitize_filename(orig_name)
-> upload to S3 with server-generated key
Content-Type forced to sniffed MIME
bucket policy: private, no public-read
served via short-lived presigned URLs
```
Hard rules:
- Never trust client-supplied `Content-Type`.
- Never store the raw user-supplied filename as the S3 key.
- Never serve images from a domain that can execute scripts (use a separate static / signed-URL domain).
- S3 bucket policy must deny `*:GetObject` to the public.
---
## 5. CSRF
The API is bearer-token only (`Authorization: Bearer <jwt>`). Browsers do **not** automatically attach `Authorization` headers cross-origin, so the classic CSRF vector (auto-submit a form, browser attaches cookie) does not apply.
This is enforced by:
- `allow_credentials=False` on CORS.
- No `Set-Cookie` issued anywhere in the backend.
- Tight `ALLOWED_ORIGINS`.
If cookie-based sessions are ever introduced (e.g. SSR Next.js with httponly cookies), CSRF tokens become mandatory β€” flagged in section 6.
---
## 6. Open items for follow-up (NOT owned by Security)
Items below are flagged for the corresponding owner; Security has not modified those files.
| # | Item | Owner | Severity |
|---|---|---|---|
| 1 | Refresh-token rotation: persist used `jti` in Redis with TTL = refresh lifetime; reject reuse | Backend Architect | High |
| 2 | Disable `/docs` and `/redoc` in production (`docs_url=None` when `ENVIRONMENT=production`) | Backend Architect | Med |
| 3 | WebSocket auth: enforce JWT within 5 s of `accept()`, close 4401 otherwise | Backend Architect (`ws.py`) | High |
| 4 | Password policy (min 12 chars, breach check via HIBP k-anonymity) at registration | Database Optimizer (`models.py`) + Backend Architect (handler) | Med |
| 5 | ML weights integrity: sha256 manifest verified before load in `ml_service.py` | Backend Architect | Med |
| 6 | S3 bucket policy review: confirm `BlockPublicAcls`, `IgnorePublicAcls`, `BlockPublicPolicy`, `RestrictPublicBuckets` all true; encryption at rest enabled | Backend Architect (`storage.py`) + Infra | High |
| 7 | Audit log: separate immutable stream for security events (login success/failure, role change, api-key issue/revoke) | Backend Architect | Med |
| 8 | Secret rotation runbook (JWT key, DB password, S3 keys) | Infra / Ops | Med |
| 9 | Penetration test before GA (target: OWASP ASVS L2) | External | High |
| 10 | KMS-managed JWT signing (migrate HS256 -> RS256/EdDSA) at scale | Backend Architect | Low (deferred) |
| 11 | CI security gates: `pip-audit`, `gitleaks`, `semgrep` on every PR | DevOps | High |
| 12 | Brute-force / credential-stuffing detection beyond simple rate limit (e.g. account lockout after N failures with cool-down) | Backend Architect | Med |
---
## 7. Deploy checklist
Before tagging a release that goes to staging or production:
- [ ] `JWT_SECRET_KEY` is set, >= 32 chars, unique per environment.
- [ ] `ENVIRONMENT` set to `staging` or `production` (enables HSTS + strict config validation).
- [ ] `ALLOWED_ORIGINS` populated with the exact production origins.
- [ ] `RATE_LIMIT_REDIS_URL` points to a managed Redis (not memory://).
- [ ] `BCRYPT_ROUNDS=12` (or higher; benchmark target ~250 ms per hash on prod CPU).
- [ ] `/docs` and `/redoc` disabled.
- [ ] Postgres / Redis / S3 reachable only over private network.
- [ ] S3 bucket: private, encryption-at-rest, lifecycle rule to purge after retention window.
- [ ] TLS cert valid; HSTS preload submitted if appropriate.
- [ ] `pip-audit` and `gitleaks` green on the build SHA.
- [ ] Access log is shipping to the aggregator and is searchable by `request_id`.
- [ ] Incident-response runbook (who to page, how to revoke a leaked JWT secret, how to rotate API keys) is current.
- [ ] Backup + restore tested in the last 30 days.
---
## 8. Reporting a vulnerability
Send to security@ (mailbox TBD). Include reproduction, impact, and a sane timeline. We commit to acknowledging within 72 h and patching critical issues within 7 days.