Spaces:

lukhsaankumar
/

DeepFakeDetectorBackend

Sleeping

App Files Files Community

lukhsaankumar commited on 25 days ago

Commit

52c0a32

1 Parent(s): 9fd7a87

Deploy DeepFake Detector API - 2026-04-20 23:04:30

Browse files

Files changed (8) hide show

COLD_START_OPTIMIZATION.md +96 -30
README.md +60 -11
app/core/config.py +18 -1
app/services/hf_hub_service.py +4 -0
data/README.md +19 -0
start.sh +16 -0
sync-bucket.ps1 +60 -0
sync-bucket.sh +83 -0

COLD_START_OPTIMIZATION.md CHANGED Viewed

@@ -50,11 +50,10 @@ Note:
 ## Current Bottlenecks
-1. Runtime model download during startup from Hugging Face Hub.
-2. Sequential submodel loading in model registry.
-3. Startup gap before model load logs (from 04:24:02 to 04:25:15) that should be instrumented for precise attribution.
-4. Environment issue: libgomp reports invalid OMP_NUM_THREADS value.
-5. Model compatibility warning: scikit-learn pickle version mismatch at startup.
 ## Implementation Plan
@@ -216,34 +215,66 @@ startup_dt = time.perf_counter() - startup_t0
 logger.info(f"Startup total duration_seconds={startup_dt:.3f}")
 ```
-## Phase 4: Runtime Hygiene (Low Effort, Prevent Hidden Slowdowns)
-### 4.1 Fix OMP setting warning
-Target file: start.sh
-Add a valid default:
-```bash
-export OMP_NUM_THREADS="${OMP_NUM_THREADS:-1}"
-```
-This removes:
-- libgomp: Invalid value for environment variable OMP_NUM_THREADS
-### 4.2 Pin scikit-learn to training-compatible version
-Target file: requirements.txt
-Observed warning indicates model pickle was produced with 1.6.1 while runtime uses 1.8.0.
-Pin:
-```text
-scikit-learn==1.6.1
-```
-This is not directly a speed optimization, but it removes compatibility risk during cold start model deserialization.
 ## Validation and Benchmark Protocol
@@ -306,16 +337,51 @@ Source log window:
 - End-to-end startup remained dominated by pre-lifespan/init time (98s still much larger than model load slice).
 - Runtime hygiene warnings no longer appeared in this run (no OMP warning and no sklearn pickle version warning).
 ## Comparison Template (Fill After Implementation)
-| Metric | Baseline (2026-04-20) | After Phase 1 | After Phase 2 | Final |
-|---|---:|---:|---:|---:|
-| Queue/build to app startup | 28s | 36s | 119s |  |
-| App startup to model-ready | 94s | 99s | 98s |  |
-| API model load phase | 21s | 5s | 4s |  |
-| vit-base load | 13s | 1s | 2s |  |
-| deit-distilled load | 5s | 2s | 2s |  |
-| Total visible build timed stages | 20.4s | 28.0s | 112.7s |  |
 ## Expected Outcome

 ## Current Bottlenecks
+1. Dominant pre-app startup delay before Python module import begins.
+2. Build-time prefetch cost when cache layers miss (extra build wall time).
+3. Model loading is no longer dominant (~4s with current cache and bounded parallel load).
+4. Cold-start variance likely includes platform scheduling/provisioning overhead.
 ## Implementation Plan
 logger.info(f"Startup total duration_seconds={startup_dt:.3f}")
 ```
+## Phase 4: Use Persistent Storage Cache (/data)
+Goal:
+- Make /data the primary cache location so model artifacts survive container rebuilds/restarts.
+Target files:
+- app/core/config.py
+- start.sh
+- app/services/hf_hub_service.py
+Plan:
+1. Prefer /data-backed cache paths when available:
+    - HF_HOME=/data/.cache/huggingface
+    - HF_CACHE_DIR=/data/.hf_cache
+2. Keep fallback to /app/.hf_cache when /data is unavailable.
+3. Ensure startup creates/chowns cache directories safely.
+4. Keep cache-hit logging so verification remains explicit in logs.
+Expected impact:
+- Faster warm boots across deploys.
+- Lower risk of repeated network fetch for large model files.
+## Phase 5: Decouple Build From Prefetch
+Goal:
+- Reduce rebuild penalty from model prefetch while keeping runtime fast.
+Target file:
+- Dockerfile
+Plan:
+1. Make build-time prefetch optional via ARG/ENV flag.
+2. Default to skipping build prefetch when persistent /data cache is enabled.
+3. Keep one-time warm path at runtime (guarded by cache/sentinel file in /data).
+Expected impact:
+- Faster image rebuild/push cycles.
+- Better developer iteration speed without sacrificing warmed production startup.
+## Phase 6: Platform/GPU Startup Characterization
+Goal:
+- Quantify how much of remaining cold start is platform provisioning vs app code.
+Plan:
+1. Run repeated cold starts with identical image on current T4.
+2. If available, test one higher-tier GPU Space and compare only phase3 markers.
+3. Record variance in:
+    - Application Startup -> module_import_start
+    - module_import_complete -> startup complete
+Notes:
+- GPU type can affect model initialization time.
+- The measured dominant delay currently occurs before app module import, so platform scheduling/provisioning is likely the bigger lever than model code tuning.
+## Phase 7: Runtime Hygiene (Completed)
+Completed changes:
+1. Set valid OMP default in start.sh.
+2. Pin scikit-learn to 1.6.1 for pickle compatibility.
 ## Validation and Benchmark Protocol
 - End-to-end startup remained dominated by pre-lifespan/init time (98s still much larger than model load slice).
 - Runtime hygiene warnings no longer appeared in this run (no OMP warning and no sklearn pickle version warning).
+## Phase 3 Results and Bottleneck Attribution
+Source log window:
+- Build queued at 2026-04-20 06:01:57
+- Application startup begins at 2026-04-20 06:02:56
+- Models loaded successfully at 2026-04-20 06:04:37
+### Phase 3 Timing Summary
+| Segment | Start | End | Duration | Notes |
+|---|---:|---:|---:|---|
+| Queue/build to app startup | 06:01:57 | 06:02:56 | 59s | Includes scheduling, build finalization, image start |
+| App startup to model-ready | 06:02:56 | 06:04:37 | 101s | End-to-end startup from Space startup marker |
+| API model load phase | 06:04:33 | 06:04:37 | 4s | From app startup handler to models loaded |
+### Phase 3 Instrumentation Breakdown (Container Runtime)
+| Marker | Duration |
+|---|---:|
+| module_import_complete | 4.050s |
+| startup_model_load_duration_seconds | 3.967s |
+| startup_lifespan_total_duration_seconds | 3.967s |
+| load_from_fusion_repo_total_duration_seconds | 3.967s |
+### Bottleneck Attribution
+- Dominant gap is before module import:
+    - 06:02:56 (Application Startup) -> 06:04:29 (module_import_start) = 93s.
+- App code after import is no longer the main problem:
+    - import + lifespan + model load is about 8s total.
+- Conclusion:
+    - Remaining cold start is primarily platform/container readiness overhead, not model download/load logic.
 ## Comparison Template (Fill After Implementation)
+| Metric | Baseline (2026-04-20) | After Phase 1 | After Phase 2 | After Phase 3 | Final |
+|---|---:|---:|---:|---:|---:|
+| Queue/build to app startup | 28s | 36s | 119s | 59s |  |
+| App startup to model-ready | 94s | 99s | 98s | 101s |  |
+| API model load phase | 21s | 5s | 4s | 4s |  |
+| vit-base load | 13s | 1s | 2s | 2s |  |
+| deit-distilled load | 5s | 2s | 2s | 2s |  |
+| Total visible build timed stages | 20.4s | 28.0s | 112.7s | 33.6s |  |
+| Phase3 module import duration | n/a | n/a | n/a | 4.050s |  |
+| Phase3 model registry total duration | n/a | n/a | n/a | 3.967s |  |
 ## Expected Outcome

README.md CHANGED Viewed

@@ -56,7 +56,10 @@ Use [backend/.env.example](.env.example) as the source of truth.
 Common runtime variables:
 - `HF_FUSION_REPO_ID` (default: `DeepFakeDetector/fusion-logreg-final`)
-- `HF_CACHE_DIR` (default: `.hf_cache`)
 - `HF_TOKEN` (optional; required for private model repos or non-interactive HF auth)
 - `GOOGLE_API_KEY` (optional; required for Gemini explanations)
 - `HOST` (default: `0.0.0.0`)
@@ -70,6 +73,7 @@ HF Spaces deploy variables (used by [backend/deploy-to-hf.sh](deploy-to-hf.sh)):
 - `HF_SPACE_WEB_URL`
 - `HF_SPACE_APP_URL`
 - `HF_DEPLOY_DIR`
 ## API Endpoints
@@ -109,17 +113,69 @@ bash ./backend/deploy-to-hf.sh
 The script will:
 - install Hugging Face CLI if needed
 - prompt/authenticate with HF (`hf auth login`) when required
 - clone Space repo into a separate temp deploy directory
 - copy backend files as-is (single `Dockerfile` setup)
 - commit and push to the HF Space
 After deploy, set Space secrets in Hugging Face:
 - `GOOGLE_API_KEY` (if using explanation endpoints)
 - `CORS_ORIGINS` (frontend domains)
 ## Deploy to Railway
 - Set service root to `backend`
@@ -138,19 +194,12 @@ After deploy, set Space secrets in Hugging Face:
 ```text
 backend/
 ├── app/
 ├── tests/
 ├── Dockerfile
 ├── deploy-to-hf.sh
-├── deploy-to-hf.ps1
-├── requirements.txt
-└── README.md
-```
-## License
-MIT
-├── deploy-to-hf.sh
-├── deploy-to-hf.ps1
 ├── requirements.txt
 └── README.md
 ```

 Common runtime variables:
 - `HF_FUSION_REPO_ID` (default: `DeepFakeDetector/fusion-logreg-final`)
+- `HF_CACHE_DIR` (default: `/app/.hf_cache`, auto-switched to `/data/.hf_cache` on Spaces when mounted)
+- `HF_HOME` (default: `/app/.cache/huggingface`, auto-switched to `/data/.cache/huggingface` on Spaces)
+- `HF_PERSISTENT_CACHE` (default: `true`)
+- `HF_PERSISTENT_CACHE_ROOT` (default: `/data`)
 - `HF_TOKEN` (optional; required for private model repos or non-interactive HF auth)
 - `GOOGLE_API_KEY` (optional; required for Gemini explanations)
 - `HOST` (default: `0.0.0.0`)
 - `HF_SPACE_WEB_URL`
 - `HF_SPACE_APP_URL`
 - `HF_DEPLOY_DIR`
+- `HF_BUCKET_SYNC_ON_DEPLOY` (default: `true`)
 ## API Endpoints
 The script will:
+- sync `backend/data` to your HF bucket first (when configured)
 - install Hugging Face CLI if needed
 - prompt/authenticate with HF (`hf auth login`) when required
 - clone Space repo into a separate temp deploy directory
 - copy backend files as-is (single `Dockerfile` setup)
 - commit and push to the HF Space
+If bucket sync should be skipped for a deploy, set:
+- `HF_BUCKET_SYNC_ON_DEPLOY=false`
 After deploy, set Space secrets in Hugging Face:
 - `GOOGLE_API_KEY` (if using explanation endpoints)
 - `CORS_ORIGINS` (frontend domains)
+## Persistent Bucket Cache on Spaces
+The backend now prefers persistent cache paths when a bucket is mounted read/write at `/data`.
+Recommended bucket mount settings in Space:
+- Mount path: `/data`
+- Access mode: `Read & Write`
+At runtime, `start.sh` will auto-select:
+- `HF_HOME=/data/.cache/huggingface`
+- `HF_CACHE_DIR=/data/.hf_cache`
+If `/data` is unavailable, it falls back to `/app/.cache/huggingface` and `/app/.hf_cache`.
+## Bucket Upload / Sync (Reproducible)
+Set these optional values in `backend/.env`:
+- `HF_BUCKET_URI=hf://buckets/lukhsaankumar/DeepFakeDetectorBackend-storage`
+- `HF_BUCKET_LOCAL_DIR=./data`
+- `HF_BUCKET_DELETE=false`
+- `HF_BUCKET_SYNC_ON_DEPLOY=true`
+Bash:
+```bash
+cd backend
+chmod +x ./sync-bucket.sh
+./sync-bucket.sh              # uses HF_BUCKET_LOCAL_DIR or ./data
+./sync-bucket.sh ./data       # explicit local path
+```
+PowerShell:
+```powershell
+cd backend
+./sync-bucket.ps1             # uses HF_BUCKET_LOCAL_DIR or ./data
+./sync-bucket.ps1 .\data      # explicit local path
+```
+Notes:
+- Scripts require authenticated HF CLI (`hf auth login`).
+- Set `HF_BUCKET_DELETE=true` to mirror local to remote (deletes remote files not present locally).
 ## Deploy to Railway
 - Set service root to `backend`
 ```text
 backend/
 ├── app/
+├── data/
 ├── tests/
 ├── Dockerfile
 ├── deploy-to-hf.sh
+├── sync-bucket.sh
+├── sync-bucket.ps1
 ├── requirements.txt
 └── README.md
 ```

app/core/config.py CHANGED Viewed

@@ -8,6 +8,20 @@ from pydantic_settings import BaseSettings
 from typing import Optional
 class Settings(BaseSettings):
     """Application settings loaded from environment variables."""
@@ -16,7 +30,10 @@ class Settings(BaseSettings):
     #   - DeepFakeDetector/fusion-logreg-final (Logistic Regression - default)
     #   - DeepFakeDetector/fusion-meta-final (Meta-classifier)
     HF_FUSION_REPO_ID: str = "DeepFakeDetector/fusion-logreg-final"
-    HF_CACHE_DIR: str = ".hf_cache"
     HF_TOKEN: Optional[str] = None
     # Google Gemini API configuration

 from typing import Optional
+def _default_hf_cache_dir() -> str:
+    """Prefer persistent storage on HF Spaces when available."""
+    if os.path.isdir("/data"):
+        return "/data/.hf_cache"
+    return "/app/.hf_cache"
+def _default_hf_home() -> str:
+    """Prefer persistent Hugging Face home on HF Spaces when available."""
+    if os.path.isdir("/data"):
+        return "/data/.cache/huggingface"
+    return "/app/.cache/huggingface"
 class Settings(BaseSettings):
     """Application settings loaded from environment variables."""
     #   - DeepFakeDetector/fusion-logreg-final (Logistic Regression - default)
     #   - DeepFakeDetector/fusion-meta-final (Meta-classifier)
     HF_FUSION_REPO_ID: str = "DeepFakeDetector/fusion-logreg-final"
+    HF_CACHE_DIR: str = _default_hf_cache_dir()
+    HF_HOME: str = _default_hf_home()
+    HF_PERSISTENT_CACHE: bool = True
+    HF_PERSISTENT_CACHE_ROOT: str = "/data"
     HF_TOKEN: Optional[str] = None
     # Google Gemini API configuration

app/services/hf_hub_service.py CHANGED Viewed

@@ -38,6 +38,10 @@ class HFHubService:
         """
         self.cache_dir = cache_dir or settings.HF_CACHE_DIR
         self.token = token or settings.HF_TOKEN
         # Ensure cache directory exists
         Path(self.cache_dir).mkdir(parents=True, exist_ok=True)

         """
         self.cache_dir = cache_dir or settings.HF_CACHE_DIR
         self.token = token or settings.HF_TOKEN
+        # Keep Hugging Face process-wide cache settings aligned with app config.
+        os.environ.setdefault("HF_HOME", settings.HF_HOME)
+        os.environ.setdefault("HF_HUB_CACHE", self.cache_dir)
         # Ensure cache directory exists
         Path(self.cache_dir).mkdir(parents=True, exist_ok=True)

data/README.md ADDED Viewed

	@@ -0,0 +1,19 @@

+# Backend Bucket Data
+This folder is the local source for bucket uploads via:
+- `./sync-bucket.sh`
+- `./sync-bucket.ps1`
+- automatic pre-deploy sync from `deploy-to-hf.sh`
+Suggested contents:
+- `models/` optional model artifacts you want persisted in the bucket
+- `cache-seed/` optional Hugging Face cache seeds
+- `metadata/` optional JSON or CSV files used by startup/runtime logic
+Notes:
+- This folder is optional. If it is empty or missing, deploy still works.
+- Runtime model caching primarily uses `/data/.hf_cache` and `/data/.cache/huggingface` inside the mounted Space volume.
+- Keep secrets out of this folder.

start.sh CHANGED Viewed

@@ -17,5 +17,21 @@ if ! [[ "${OMP_NUM_THREADS:-}" =~ ^[0-9]+$ ]]; then
     export OMP_NUM_THREADS=1
 fi
 echo "Starting uvicorn on port $PORT"
 exec uvicorn app.main:app --host 0.0.0.0 --port "$PORT" --log-level info

     export OMP_NUM_THREADS=1
 fi
+# Prefer persistent HF cache on Spaces when /data is available and writable.
+PERSIST_ROOT="${HF_PERSISTENT_CACHE_ROOT:-/data}"
+if [ "${HF_PERSISTENT_CACHE:-true}" = "true" ] && [ -d "$PERSIST_ROOT" ] && [ -w "$PERSIST_ROOT" ]; then
+    export HF_HOME="${HF_HOME:-$PERSIST_ROOT/.cache/huggingface}"
+    if [ -z "${HF_CACHE_DIR:-}" ] || [ "${HF_CACHE_DIR}" = ".hf_cache" ] || [ "${HF_CACHE_DIR}" = "/app/.hf_cache" ]; then
+        export HF_CACHE_DIR="$PERSIST_ROOT/.hf_cache"
+    fi
+else
+    export HF_HOME="${HF_HOME:-/app/.cache/huggingface}"
+    export HF_CACHE_DIR="${HF_CACHE_DIR:-/app/.hf_cache}"
+fi
+mkdir -p "$HF_HOME" "$HF_CACHE_DIR" 2>/dev/null || true
+echo "Using HF cache dir: $HF_CACHE_DIR"
+echo "Using HF home dir: $HF_HOME"
 echo "Starting uvicorn on port $PORT"
 exec uvicorn app.main:app --host 0.0.0.0 --port "$PORT" --log-level info

sync-bucket.ps1 ADDED Viewed

	@@ -0,0 +1,60 @@

+#!/usr/bin/env pwsh
+Set-StrictMode -Version Latest
+$ErrorActionPreference = 'Stop'
+$ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
+Set-Location $ScriptDir
+if (Test-Path .env) {
+    Get-Content .env | ForEach-Object {
+        $line = $_.Trim()
+        if ($line -eq '' -or $line.StartsWith('#')) { return }
+        $parts = $line.Split('=', 2)
+        if ($parts.Count -ne 2) { return }
+        $name = $parts[0].Trim()
+        $value = $parts[1].Trim()
+        if ($value.StartsWith('"') -and $value.EndsWith('"')) {
+            $value = $value.Substring(1, $value.Length - 2)
+        } elseif ($value.StartsWith("'") -and $value.EndsWith("'")) {
+            $value = $value.Substring(1, $value.Length - 2)
+        }
+        if ($name -match '^[A-Za-z_][A-Za-z0-9_]*$') {
+            [Environment]::SetEnvironmentVariable($name, $value, 'Process')
+        }
+    }
+}
+$BucketUri = if ($env:HF_BUCKET_URI) { $env:HF_BUCKET_URI } else { 'hf://buckets/lukhsaankumar/DeepFakeDetectorBackend-storage' }
+$LocalDir = if ($args.Count -gt 0 -and $args[0]) { $args[0] } elseif ($env:HF_BUCKET_LOCAL_DIR) { $env:HF_BUCKET_LOCAL_DIR } else { './data' }
+$DeleteFlag = if ($env:HF_BUCKET_DELETE) { $env:HF_BUCKET_DELETE } else { 'false' }
+if (-not (Get-Command hf -ErrorAction SilentlyContinue)) {
+    Write-Error 'Hugging Face CLI (hf) is not installed. Install guide: https://hf.co/docs/huggingface_hub/guides/cli'
+}
+try {
+    hf auth whoami | Out-Null
+} catch {
+    Write-Error 'Hugging Face CLI is not authenticated. Run: hf auth login'
+}
+if (-not (Test-Path $LocalDir -PathType Container)) {
+    Write-Error "Local directory does not exist: $LocalDir"
+}
+Write-Host 'Syncing local directory to HF bucket'
+Write-Host "  Local : $LocalDir"
+Write-Host "  Bucket: $BucketUri"
+if ($DeleteFlag -eq 'true') {
+    Write-Host '  Mode  : mirror (delete remote files not present locally)'
+    hf sync $LocalDir $BucketUri --delete
+} else {
+    Write-Host '  Mode  : additive (no remote deletes)'
+    hf sync $LocalDir $BucketUri
+}
+Write-Host 'Bucket sync completed successfully.'

sync-bucket.sh ADDED Viewed

	@@ -0,0 +1,83 @@

+#!/usr/bin/env bash
+# Sync local data into a Hugging Face bucket.
+# Defaults are loaded from backend/.env when present.
+set -euo pipefail
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+if [ -f ".env" ]; then
+  while IFS= read -r raw_line || [ -n "$raw_line" ]; do
+    line="${raw_line%$'\r'}"
+    case "$line" in
+      ''|'#'*) continue ;;
+    esac
+    if [[ "$line" == *=* ]]; then
+      key="${line%%=*}"
+      value="${line#*=}"
+      key="$(echo "$key" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')"
+      if [[ ! "$key" =~ ^[A-Za-z_][A-Za-z0-9_]*$ ]]; then
+        continue
+      fi
+      if [[ "$value" =~ ^\".*\"$ ]]; then
+        value="${value:1:${#value}-2}"
+      elif [[ "$value" =~ ^\'.*\'$ ]]; then
+        value="${value:1:${#value}-2}"
+      else
+        value="$(echo "$value" | sed 's/[[:space:]]#.*$//;s/[[:space:]]*$//')"
+      fi
+      export "$key=$value"
+    fi
+  done < ./.env
+fi
+BUCKET_URI="${HF_BUCKET_URI:-hf://buckets/<username>/<bucket-name>}"
+LOCAL_DIR="${1:-${HF_BUCKET_LOCAL_DIR:-./data}}"
+DELETE_FLAG="${HF_BUCKET_DELETE:-false}"
+# Force host-safe HF cache path for CLI operations.
+HF_HOME_HOST_DEFAULT="${HOME:-$PWD}/.cache/huggingface"
+HF_HOME="${HF_HOME_HOST:-${DEPLOY_HF_HOME:-$HF_HOME_HOST_DEFAULT}}"
+export HF_HOME
+if [[ "$BUCKET_URI" == *"<username>"* ]] || [[ "$BUCKET_URI" == *"<bucket-name>"* ]]; then
+  echo "ERROR: HF_BUCKET_URI is still a placeholder: $BUCKET_URI"
+  echo "Set HF_BUCKET_URI in backend/.env to your real bucket URI."
+  exit 1
+fi
+if ! command -v hf >/dev/null 2>&1; then
+  echo "ERROR: Hugging Face CLI (hf) is not installed."
+  echo "Install guide: https://hf.co/docs/huggingface_hub/guides/cli"
+  exit 1
+fi
+if ! hf auth whoami >/dev/null 2>&1; then
+  echo "ERROR: Hugging Face CLI is not authenticated. Run: hf auth login"
+  exit 1
+fi
+if [ ! -d "$LOCAL_DIR" ]; then
+  echo "ERROR: Local directory does not exist: $LOCAL_DIR"
+  exit 1
+fi
+echo "Syncing local directory to HF bucket"
+echo "  Local : $LOCAL_DIR"
+echo "  Bucket: $BUCKET_URI"
+if [ "$DELETE_FLAG" = "true" ]; then
+  echo "  Mode  : mirror (delete remote files not present locally)"
+  hf sync "$LOCAL_DIR" "$BUCKET_URI" --delete
+else
+  echo "  Mode  : additive (no remote deletes)"
+  hf sync "$LOCAL_DIR" "$BUCKET_URI"
+fi
+echo "Bucket sync completed successfully."