Spaces:

lukhsaankumar
/

DeepFakeDetectorBackend

Sleeping

App Files Files Community

lukhsaankumar commited on 26 days ago

Commit

db10084

1 Parent(s): 2de2f1c

Deploy DeepFake Detector API - 2026-04-20 00:53:30

Browse files

Files changed (6) hide show

COLD_START_OPTIMIZATION.md +298 -0
Dockerfile +11 -3
README.md +1 -7
app/scripts/__init__.py +1 -0
app/scripts/prefetch_models.py +23 -0
app/services/hf_hub_service.py +5 -0

COLD_START_OPTIMIZATION.md ADDED Viewed

	@@ -0,0 +1,298 @@

+# Cold Start Optimization Implementation Guide (HF Spaces GPU)
+## Goal
+Reduce end-to-end cold start time for the backend on Hugging Face Spaces GPU while preserving inference quality and endpoint behavior.
+This guide is focused only on cold start optimization for the current FastAPI architecture.
+## Baseline From Current Logs
+Source log window:
+- Build queued at 2026-04-20 04:23:34
+- Application startup begins at 2026-04-20 04:24:02
+- Models loaded successfully at 2026-04-20 04:25:36
+### Baseline Timing Summary
+| Segment | Start | End | Duration | Notes |
+|---|---:|---:|---:|---|
+| Queue/build to app startup | 04:23:34 | 04:24:02 | 28s | Includes scheduling, build finalization, image start |
+| App startup to model-ready | 04:24:02 | 04:25:36 | 94s | Time from uvicorn start message to models loaded |
+| API model load phase | 04:25:15 | 04:25:36 | 21s | From "Starting DeepFake Detector API..." to "Models loaded successfully!" |
+### Build Stage Durations Visible In Logs
+| Build Stage | Duration |
+|---|---:|
+| Restoring cache | 19.5s |
+| COPY source to /app | 0.0s |
+| mkdir/chown/chmod step | 0.1s |
+| Pushing image | 0.7s |
+| Exporting cache | 0.1s |
+| Total visible timed stages | 20.4s |
+Note:
+- Several Docker steps were cache hits and reported as CACHED without explicit timing.
+- "Application startup complete" appears immediately after model load logs; no explicit timestamp is printed, so 04:25:36 is used as the practical ready time.
+### Model Load Breakdown (Current)
+| Model | Start | End | Duration | Observation |
+|---|---:|---:|---:|---|
+| Fusion repo config | 04:25:15 | 04:25:16 | 1s | Fast |
+| cnn-transfer-final | 04:25:16 | 04:25:17 | 1s | Fast |
+| vit-base-final | 04:25:17 | 04:25:30 | 13s | Dominant bottleneck |
+| deit-distilled-final | 04:25:30 | 04:25:35 | 5s | Moderate |
+| gradfield-cnn-final | 04:25:35 | 04:25:35 | <1s | Fast |
+| fusion model load | 04:25:35 | 04:25:36 | 1s | Fast |
+| Total model load | 04:25:15 | 04:25:36 | 21s | Sequential loading |
+## Current Bottlenecks
+1. Runtime model download during startup from Hugging Face Hub.
+2. Sequential submodel loading in model registry.
+3. Startup gap before model load logs (from 04:24:02 to 04:25:15) that should be instrumented for precise attribution.
+4. Environment issue: libgomp reports invalid OMP_NUM_THREADS value.
+5. Model compatibility warning: scikit-learn pickle version mismatch at startup.
+## Implementation Plan
+## Phase 1: Remove Runtime Model Downloads (Highest Impact)
+### 1.1 Add model prefetch script
+Create file: app/scripts/prefetch_models.py
+Purpose:
+- Download fusion repo and all submodel repos at build time into HF_CACHE_DIR.
+- Ensure cold start does not wait on remote model downloads.
+Implementation:
+```python
+import asyncio
+from app.core.config import settings
+from app.services.model_registry import get_model_registry
+async def main() -> None:
+    registry = get_model_registry()
+    await registry.load_from_fusion_repo(settings.HF_FUSION_REPO_ID, force_reload=True)
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+### 1.2 Update Dockerfile for build-time prefetch
+Target file: Dockerfile
+Key changes:
+1. Keep dependency installation in a stable cache layer.
+2. Copy only application code needed for prefetch before full source copy.
+3. Run prefetch script during build with HF cache directory set.
+4. Keep ownership and permissions for user uid 1000.
+Implementation sketch:
+```dockerfile
+FROM python:3.11-slim
+WORKDIR /app
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_DISABLE_PIP_VERSION_CHECK=1 \
+    PORT=7860 \
+    HF_CACHE_DIR=/app/.hf_cache
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    curl \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+RUN useradd -m -u 1000 user
+ENV PATH="/home/user/.local/bin:$PATH"
+COPY requirements.txt .
+RUN pip install --no-cache-dir --upgrade -r requirements.txt
+# Copy app code required for prefetch
+COPY app /app/app
+COPY start.sh /app/start.sh
+RUN mkdir -p /app/.hf_cache
+# Build-time model prefetch (requires public repos or HF token in build env)
+RUN python -m app.scripts.prefetch_models
+RUN chown -R user:user /app && chmod +x /app/start.sh
+USER user
+EXPOSE 7860
+CMD ["./start.sh"]
+```
+Notes:
+- If private model repos are used, build needs HF_TOKEN.
+- This increases image size but reduces startup wait caused by downloads.
+### 1.3 Verify HF cache is reused at runtime
+Target file: app/services/hf_hub_service.py
+Behavior to enforce:
+- Keep deterministic local_dir path under /app/.hf_cache.
+- Log cache hits clearly before download attempt.
+Add logic before snapshot_download call:
+```python
+cached = self.get_cached_path(repo_id)
+if cached and not force_download:
+    logger.info(f"Using cached repo for {repo_id}: {cached}")
+    return cached
+```
+## Phase 2: Parallelize Submodel Loading
+Target file: app/services/model_registry.py
+Current behavior:
+- Submodels are loaded one by one.
+New behavior:
+- Load submodels concurrently with bounded parallelism.
+Implementation steps:
+1. Add a semaphore, for example max concurrency 2.
+2. Replace sequential loop with asyncio.gather.
+3. Keep deterministic final registration and clear error propagation.
+Implementation sketch:
+```python
+sem = asyncio.Semaphore(2)
+async def _load_with_limit(repo_id: str) -> None:
+    async with sem:
+        await self._load_submodel(repo_id)
+tasks = [_load_with_limit(repo_id) for repo_id in submodel_repos]
+results = await asyncio.gather(*tasks, return_exceptions=True)
+errors = [r for r in results if isinstance(r, Exception)]
+if errors:
+    raise RuntimeError(f"Failed to load one or more submodels: {errors}")
+```
+Reason for bounded parallelism:
+- Reduces startup time without overwhelming memory/network in GPU Space containers.
+## Phase 3: Add Startup Instrumentation For Reliable Comparisons
+Target file: app/main.py
+Add timing markers:
+- App startup begin timestamp.
+- Model loading start and end.
+- Total lifespan startup duration.
+Implementation sketch:
+```python
+import time
+startup_t0 = time.perf_counter()
+...
+model_t0 = time.perf_counter()
+await registry.load_from_fusion_repo(settings.HF_FUSION_REPO_ID)
+model_dt = time.perf_counter() - model_t0
+logger.info(f"Model load duration_seconds={model_dt:.3f}")
+...
+startup_dt = time.perf_counter() - startup_t0
+logger.info(f"Startup total duration_seconds={startup_dt:.3f}")
+```
+## Phase 4: Runtime Hygiene (Low Effort, Prevent Hidden Slowdowns)
+### 4.1 Fix OMP setting warning
+Target file: start.sh
+Add a valid default:
+```bash
+export OMP_NUM_THREADS="${OMP_NUM_THREADS:-1}"
+```
+This removes:
+- libgomp: Invalid value for environment variable OMP_NUM_THREADS
+### 4.2 Pin scikit-learn to training-compatible version
+Target file: requirements.txt
+Observed warning indicates model pickle was produced with 1.6.1 while runtime uses 1.8.0.
+Pin:
+```text
+scikit-learn==1.6.1
+```
+This is not directly a speed optimization, but it removes compatibility risk during cold start model deserialization.
+## Validation and Benchmark Protocol
+Use the same procedure before and after changes.
+1. Force a cold deployment in HF Space.
+2. Record these timestamps from logs:
+   - Build queued time
+   - Application startup time
+   - Starting DeepFake Detector API
+   - Models loaded successfully
+   - Application startup complete
+3. Compute:
+   - Queue/build to app startup
+   - App startup to model-ready
+   - API model load phase
+4. Capture per-model load durations from logs.
+5. Save a comparison table in this file.
+## Comparison Template (Fill After Implementation)
+| Metric | Baseline (2026-04-20) | After Phase 1 | After Phase 2 | Final |
+|---|---:|---:|---:|---:|
+| Queue/build to app startup | 28s |  |  |  |
+| App startup to model-ready | 94s |  |  |  |
+| API model load phase | 21s |  |  |  |
+| vit-base load | 13s |  |  |  |
+| deit-distilled load | 5s |  |  |  |
+| Total visible build timed stages | 20.4s |  |  |  |
+## Expected Outcome
+Primary expected wins:
+1. Reduced startup latency by avoiding runtime model downloads.
+2. Reduced model load wall-clock via parallel submodel loads.
+3. Stable and comparable timing data for iterative tuning.
+Secondary expected wins:
+1. Cleaner startup logs (no OMP warning).
+2. Lower risk from sklearn deserialization mismatch.
+## Rollback Plan
+If anything regresses:
+1. Revert parallel loading only and keep build-time prefetch.
+2. Revert build-time prefetch and restore runtime download flow.
+3. Keep instrumentation to retain comparability.
+## Notes
+- This plan intentionally keeps current FastAPI inference architecture unchanged.
+- Triton feasibility can be revisited after cold start metrics improve and stabilize.

Dockerfile CHANGED Viewed

@@ -11,7 +11,8 @@ ENV PYTHONDONTWRITEBYTECODE=1 \
     PYTHONUNBUFFERED=1 \
     PIP_NO_CACHE_DIR=1 \
     PIP_DISABLE_PIP_VERSION_CHECK=1 \
-    PORT=7860
 # Install system dependencies
 RUN apt-get update && apt-get install -y --no-install-recommends \
@@ -30,8 +31,9 @@ ENV PATH="/home/user/.local/bin:$PATH"
 COPY --chown=user:user requirements.txt .
 RUN pip install --no-cache-dir --upgrade -r requirements.txt
-# Copy application code
-COPY --chown=user:user . /app
 # Switch to root to create cache directory and set permissions
 USER root
@@ -40,6 +42,12 @@ RUN mkdir -p /app/.hf_cache && chown -R user:user /app/.hf_cache && chmod +x /ap
 # Switch back to user
 USER user
 # Expose default app port
 EXPOSE 7860

     PYTHONUNBUFFERED=1 \
     PIP_NO_CACHE_DIR=1 \
     PIP_DISABLE_PIP_VERSION_CHECK=1 \
+    PORT=7860 \
+    HF_CACHE_DIR=/app/.hf_cache
 # Install system dependencies
 RUN apt-get update && apt-get install -y --no-install-recommends \
 COPY --chown=user:user requirements.txt .
 RUN pip install --no-cache-dir --upgrade -r requirements.txt
+# Copy only files required for model prefetch first.
+COPY --chown=user:user app /app/app
+COPY --chown=user:user start.sh /app/start.sh
 # Switch to root to create cache directory and set permissions
 USER root
 # Switch back to user
 USER user
+# Prefetch model artifacts at build time so startup does not wait on model downloads.
+RUN python -m app.scripts.prefetch_models
+# Copy full project contents after prefetch so docs/tests edits do not invalidate prefetch layers.
+COPY --chown=user:user . /app
 # Expose default app port
 EXPOSE 7860

README.md CHANGED Viewed

@@ -101,13 +101,7 @@ Recommended path is the Bash deploy script.
 1. Configure [backend/.env](.env) from [backend/.env.example](.env.example)
 2. Ensure `HF_SPACE_URL` and related deploy variables are set
-3. Run from backend folder:
-```bash
-bash ./deploy-to-hf.sh
-```
-Or run from repo root:
 ```bash
 bash ./backend/deploy-to-hf.sh

 1. Configure [backend/.env](.env) from [backend/.env.example](.env.example)
 2. Ensure `HF_SPACE_URL` and related deploy variables are set
+3. Run from the repo root:
 ```bash
 bash ./backend/deploy-to-hf.sh

app/scripts/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Helper scripts for backend operational tasks."""

app/scripts/prefetch_models.py ADDED Viewed

	@@ -0,0 +1,23 @@

+"""Build-time model prefetch utility for reducing cold-start downloads."""
+import asyncio
+from app.core.config import settings
+from app.core.logging import get_logger, setup_logging
+from app.services.model_registry import get_model_registry
+setup_logging()
+logger = get_logger(__name__)
+async def main() -> None:
+    """Download fusion and submodel repositories into the configured HF cache."""
+    logger.info("Starting build-time model prefetch for %s", settings.HF_FUSION_REPO_ID)
+    registry = get_model_registry()
+    await registry.load_from_fusion_repo(settings.HF_FUSION_REPO_ID, force_reload=True)
+    logger.info("Build-time model prefetch completed")
+if __name__ == "__main__":
+    asyncio.run(main())

app/services/hf_hub_service.py CHANGED Viewed

@@ -69,6 +69,11 @@ class HFHubService:
         logger.info(f"Downloading repo: {repo_id} (revision={revision}, force={force_download})")
         try:
             # Use local_dir instead of cache_dir to avoid symlink issues on Windows
             repo_name = repo_id.replace("/", "--")
             local_dir = Path(self.cache_dir) / repo_name

         logger.info(f"Downloading repo: {repo_id} (revision={revision}, force={force_download})")
         try:
+            cached_path = self.get_cached_path(repo_id)
+            if cached_path and not force_download:
+                logger.info(f"Using cached repo for {repo_id}: {cached_path}")
+                return cached_path
             # Use local_dir instead of cache_dir to avoid symlink issues on Windows
             repo_name = repo_id.replace("/", "--")
             local_dir = Path(self.cache_dir) / repo_name