lukhsaankumar commited on
Commit
db10084
·
1 Parent(s): 2de2f1c

Deploy DeepFake Detector API - 2026-04-20 00:53:30

Browse files
COLD_START_OPTIMIZATION.md ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Cold Start Optimization Implementation Guide (HF Spaces GPU)
2
+
3
+ ## Goal
4
+
5
+ Reduce end-to-end cold start time for the backend on Hugging Face Spaces GPU while preserving inference quality and endpoint behavior.
6
+
7
+ This guide is focused only on cold start optimization for the current FastAPI architecture.
8
+
9
+ ## Baseline From Current Logs
10
+
11
+ Source log window:
12
+ - Build queued at 2026-04-20 04:23:34
13
+ - Application startup begins at 2026-04-20 04:24:02
14
+ - Models loaded successfully at 2026-04-20 04:25:36
15
+
16
+ ### Baseline Timing Summary
17
+
18
+ | Segment | Start | End | Duration | Notes |
19
+ |---|---:|---:|---:|---|
20
+ | Queue/build to app startup | 04:23:34 | 04:24:02 | 28s | Includes scheduling, build finalization, image start |
21
+ | App startup to model-ready | 04:24:02 | 04:25:36 | 94s | Time from uvicorn start message to models loaded |
22
+ | API model load phase | 04:25:15 | 04:25:36 | 21s | From "Starting DeepFake Detector API..." to "Models loaded successfully!" |
23
+
24
+ ### Build Stage Durations Visible In Logs
25
+
26
+ | Build Stage | Duration |
27
+ |---|---:|
28
+ | Restoring cache | 19.5s |
29
+ | COPY source to /app | 0.0s |
30
+ | mkdir/chown/chmod step | 0.1s |
31
+ | Pushing image | 0.7s |
32
+ | Exporting cache | 0.1s |
33
+ | Total visible timed stages | 20.4s |
34
+
35
+ Note:
36
+ - Several Docker steps were cache hits and reported as CACHED without explicit timing.
37
+ - "Application startup complete" appears immediately after model load logs; no explicit timestamp is printed, so 04:25:36 is used as the practical ready time.
38
+
39
+ ### Model Load Breakdown (Current)
40
+
41
+ | Model | Start | End | Duration | Observation |
42
+ |---|---:|---:|---:|---|
43
+ | Fusion repo config | 04:25:15 | 04:25:16 | 1s | Fast |
44
+ | cnn-transfer-final | 04:25:16 | 04:25:17 | 1s | Fast |
45
+ | vit-base-final | 04:25:17 | 04:25:30 | 13s | Dominant bottleneck |
46
+ | deit-distilled-final | 04:25:30 | 04:25:35 | 5s | Moderate |
47
+ | gradfield-cnn-final | 04:25:35 | 04:25:35 | <1s | Fast |
48
+ | fusion model load | 04:25:35 | 04:25:36 | 1s | Fast |
49
+ | Total model load | 04:25:15 | 04:25:36 | 21s | Sequential loading |
50
+
51
+ ## Current Bottlenecks
52
+
53
+ 1. Runtime model download during startup from Hugging Face Hub.
54
+ 2. Sequential submodel loading in model registry.
55
+ 3. Startup gap before model load logs (from 04:24:02 to 04:25:15) that should be instrumented for precise attribution.
56
+ 4. Environment issue: libgomp reports invalid OMP_NUM_THREADS value.
57
+ 5. Model compatibility warning: scikit-learn pickle version mismatch at startup.
58
+
59
+ ## Implementation Plan
60
+
61
+ ## Phase 1: Remove Runtime Model Downloads (Highest Impact)
62
+
63
+ ### 1.1 Add model prefetch script
64
+
65
+ Create file: app/scripts/prefetch_models.py
66
+
67
+ Purpose:
68
+ - Download fusion repo and all submodel repos at build time into HF_CACHE_DIR.
69
+ - Ensure cold start does not wait on remote model downloads.
70
+
71
+ Implementation:
72
+
73
+ ```python
74
+ import asyncio
75
+ from app.core.config import settings
76
+ from app.services.model_registry import get_model_registry
77
+
78
+
79
+ async def main() -> None:
80
+ registry = get_model_registry()
81
+ await registry.load_from_fusion_repo(settings.HF_FUSION_REPO_ID, force_reload=True)
82
+
83
+
84
+ if __name__ == "__main__":
85
+ asyncio.run(main())
86
+ ```
87
+
88
+ ### 1.2 Update Dockerfile for build-time prefetch
89
+
90
+ Target file: Dockerfile
91
+
92
+ Key changes:
93
+ 1. Keep dependency installation in a stable cache layer.
94
+ 2. Copy only application code needed for prefetch before full source copy.
95
+ 3. Run prefetch script during build with HF cache directory set.
96
+ 4. Keep ownership and permissions for user uid 1000.
97
+
98
+ Implementation sketch:
99
+
100
+ ```dockerfile
101
+ FROM python:3.11-slim
102
+
103
+ WORKDIR /app
104
+
105
+ ENV PYTHONDONTWRITEBYTECODE=1 \
106
+ PYTHONUNBUFFERED=1 \
107
+ PIP_NO_CACHE_DIR=1 \
108
+ PIP_DISABLE_PIP_VERSION_CHECK=1 \
109
+ PORT=7860 \
110
+ HF_CACHE_DIR=/app/.hf_cache
111
+
112
+ RUN apt-get update && apt-get install -y --no-install-recommends \
113
+ curl \
114
+ git \
115
+ && rm -rf /var/lib/apt/lists/*
116
+
117
+ RUN useradd -m -u 1000 user
118
+ ENV PATH="/home/user/.local/bin:$PATH"
119
+
120
+ COPY requirements.txt .
121
+ RUN pip install --no-cache-dir --upgrade -r requirements.txt
122
+
123
+ # Copy app code required for prefetch
124
+ COPY app /app/app
125
+ COPY start.sh /app/start.sh
126
+
127
+ RUN mkdir -p /app/.hf_cache
128
+
129
+ # Build-time model prefetch (requires public repos or HF token in build env)
130
+ RUN python -m app.scripts.prefetch_models
131
+
132
+ RUN chown -R user:user /app && chmod +x /app/start.sh
133
+ USER user
134
+
135
+ EXPOSE 7860
136
+ CMD ["./start.sh"]
137
+ ```
138
+
139
+ Notes:
140
+ - If private model repos are used, build needs HF_TOKEN.
141
+ - This increases image size but reduces startup wait caused by downloads.
142
+
143
+ ### 1.3 Verify HF cache is reused at runtime
144
+
145
+ Target file: app/services/hf_hub_service.py
146
+
147
+ Behavior to enforce:
148
+ - Keep deterministic local_dir path under /app/.hf_cache.
149
+ - Log cache hits clearly before download attempt.
150
+
151
+ Add logic before snapshot_download call:
152
+
153
+ ```python
154
+ cached = self.get_cached_path(repo_id)
155
+ if cached and not force_download:
156
+ logger.info(f"Using cached repo for {repo_id}: {cached}")
157
+ return cached
158
+ ```
159
+
160
+ ## Phase 2: Parallelize Submodel Loading
161
+
162
+ Target file: app/services/model_registry.py
163
+
164
+ Current behavior:
165
+ - Submodels are loaded one by one.
166
+
167
+ New behavior:
168
+ - Load submodels concurrently with bounded parallelism.
169
+
170
+ Implementation steps:
171
+ 1. Add a semaphore, for example max concurrency 2.
172
+ 2. Replace sequential loop with asyncio.gather.
173
+ 3. Keep deterministic final registration and clear error propagation.
174
+
175
+ Implementation sketch:
176
+
177
+ ```python
178
+ sem = asyncio.Semaphore(2)
179
+
180
+ async def _load_with_limit(repo_id: str) -> None:
181
+ async with sem:
182
+ await self._load_submodel(repo_id)
183
+
184
+ tasks = [_load_with_limit(repo_id) for repo_id in submodel_repos]
185
+ results = await asyncio.gather(*tasks, return_exceptions=True)
186
+ errors = [r for r in results if isinstance(r, Exception)]
187
+ if errors:
188
+ raise RuntimeError(f"Failed to load one or more submodels: {errors}")
189
+ ```
190
+
191
+ Reason for bounded parallelism:
192
+ - Reduces startup time without overwhelming memory/network in GPU Space containers.
193
+
194
+ ## Phase 3: Add Startup Instrumentation For Reliable Comparisons
195
+
196
+ Target file: app/main.py
197
+
198
+ Add timing markers:
199
+ - App startup begin timestamp.
200
+ - Model loading start and end.
201
+ - Total lifespan startup duration.
202
+
203
+ Implementation sketch:
204
+
205
+ ```python
206
+ import time
207
+
208
+ startup_t0 = time.perf_counter()
209
+ ...
210
+ model_t0 = time.perf_counter()
211
+ await registry.load_from_fusion_repo(settings.HF_FUSION_REPO_ID)
212
+ model_dt = time.perf_counter() - model_t0
213
+ logger.info(f"Model load duration_seconds={model_dt:.3f}")
214
+ ...
215
+ startup_dt = time.perf_counter() - startup_t0
216
+ logger.info(f"Startup total duration_seconds={startup_dt:.3f}")
217
+ ```
218
+
219
+ ## Phase 4: Runtime Hygiene (Low Effort, Prevent Hidden Slowdowns)
220
+
221
+ ### 4.1 Fix OMP setting warning
222
+
223
+ Target file: start.sh
224
+
225
+ Add a valid default:
226
+
227
+ ```bash
228
+ export OMP_NUM_THREADS="${OMP_NUM_THREADS:-1}"
229
+ ```
230
+
231
+ This removes:
232
+ - libgomp: Invalid value for environment variable OMP_NUM_THREADS
233
+
234
+ ### 4.2 Pin scikit-learn to training-compatible version
235
+
236
+ Target file: requirements.txt
237
+
238
+ Observed warning indicates model pickle was produced with 1.6.1 while runtime uses 1.8.0.
239
+
240
+ Pin:
241
+
242
+ ```text
243
+ scikit-learn==1.6.1
244
+ ```
245
+
246
+ This is not directly a speed optimization, but it removes compatibility risk during cold start model deserialization.
247
+
248
+ ## Validation and Benchmark Protocol
249
+
250
+ Use the same procedure before and after changes.
251
+
252
+ 1. Force a cold deployment in HF Space.
253
+ 2. Record these timestamps from logs:
254
+ - Build queued time
255
+ - Application startup time
256
+ - Starting DeepFake Detector API
257
+ - Models loaded successfully
258
+ - Application startup complete
259
+ 3. Compute:
260
+ - Queue/build to app startup
261
+ - App startup to model-ready
262
+ - API model load phase
263
+ 4. Capture per-model load durations from logs.
264
+ 5. Save a comparison table in this file.
265
+
266
+ ## Comparison Template (Fill After Implementation)
267
+
268
+ | Metric | Baseline (2026-04-20) | After Phase 1 | After Phase 2 | Final |
269
+ |---|---:|---:|---:|---:|
270
+ | Queue/build to app startup | 28s | | | |
271
+ | App startup to model-ready | 94s | | | |
272
+ | API model load phase | 21s | | | |
273
+ | vit-base load | 13s | | | |
274
+ | deit-distilled load | 5s | | | |
275
+ | Total visible build timed stages | 20.4s | | | |
276
+
277
+ ## Expected Outcome
278
+
279
+ Primary expected wins:
280
+ 1. Reduced startup latency by avoiding runtime model downloads.
281
+ 2. Reduced model load wall-clock via parallel submodel loads.
282
+ 3. Stable and comparable timing data for iterative tuning.
283
+
284
+ Secondary expected wins:
285
+ 1. Cleaner startup logs (no OMP warning).
286
+ 2. Lower risk from sklearn deserialization mismatch.
287
+
288
+ ## Rollback Plan
289
+
290
+ If anything regresses:
291
+ 1. Revert parallel loading only and keep build-time prefetch.
292
+ 2. Revert build-time prefetch and restore runtime download flow.
293
+ 3. Keep instrumentation to retain comparability.
294
+
295
+ ## Notes
296
+
297
+ - This plan intentionally keeps current FastAPI inference architecture unchanged.
298
+ - Triton feasibility can be revisited after cold start metrics improve and stabilize.
Dockerfile CHANGED
@@ -11,7 +11,8 @@ ENV PYTHONDONTWRITEBYTECODE=1 \
11
  PYTHONUNBUFFERED=1 \
12
  PIP_NO_CACHE_DIR=1 \
13
  PIP_DISABLE_PIP_VERSION_CHECK=1 \
14
- PORT=7860
 
15
 
16
  # Install system dependencies
17
  RUN apt-get update && apt-get install -y --no-install-recommends \
@@ -30,8 +31,9 @@ ENV PATH="/home/user/.local/bin:$PATH"
30
  COPY --chown=user:user requirements.txt .
31
  RUN pip install --no-cache-dir --upgrade -r requirements.txt
32
 
33
- # Copy application code
34
- COPY --chown=user:user . /app
 
35
 
36
  # Switch to root to create cache directory and set permissions
37
  USER root
@@ -40,6 +42,12 @@ RUN mkdir -p /app/.hf_cache && chown -R user:user /app/.hf_cache && chmod +x /ap
40
  # Switch back to user
41
  USER user
42
 
 
 
 
 
 
 
43
  # Expose default app port
44
  EXPOSE 7860
45
 
 
11
  PYTHONUNBUFFERED=1 \
12
  PIP_NO_CACHE_DIR=1 \
13
  PIP_DISABLE_PIP_VERSION_CHECK=1 \
14
+ PORT=7860 \
15
+ HF_CACHE_DIR=/app/.hf_cache
16
 
17
  # Install system dependencies
18
  RUN apt-get update && apt-get install -y --no-install-recommends \
 
31
  COPY --chown=user:user requirements.txt .
32
  RUN pip install --no-cache-dir --upgrade -r requirements.txt
33
 
34
+ # Copy only files required for model prefetch first.
35
+ COPY --chown=user:user app /app/app
36
+ COPY --chown=user:user start.sh /app/start.sh
37
 
38
  # Switch to root to create cache directory and set permissions
39
  USER root
 
42
  # Switch back to user
43
  USER user
44
 
45
+ # Prefetch model artifacts at build time so startup does not wait on model downloads.
46
+ RUN python -m app.scripts.prefetch_models
47
+
48
+ # Copy full project contents after prefetch so docs/tests edits do not invalidate prefetch layers.
49
+ COPY --chown=user:user . /app
50
+
51
  # Expose default app port
52
  EXPOSE 7860
53
 
README.md CHANGED
@@ -101,13 +101,7 @@ Recommended path is the Bash deploy script.
101
 
102
  1. Configure [backend/.env](.env) from [backend/.env.example](.env.example)
103
  2. Ensure `HF_SPACE_URL` and related deploy variables are set
104
- 3. Run from backend folder:
105
-
106
- ```bash
107
- bash ./deploy-to-hf.sh
108
- ```
109
-
110
- Or run from repo root:
111
 
112
  ```bash
113
  bash ./backend/deploy-to-hf.sh
 
101
 
102
  1. Configure [backend/.env](.env) from [backend/.env.example](.env.example)
103
  2. Ensure `HF_SPACE_URL` and related deploy variables are set
104
+ 3. Run from the repo root:
 
 
 
 
 
 
105
 
106
  ```bash
107
  bash ./backend/deploy-to-hf.sh
app/scripts/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Helper scripts for backend operational tasks."""
app/scripts/prefetch_models.py ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Build-time model prefetch utility for reducing cold-start downloads."""
2
+
3
+ import asyncio
4
+
5
+ from app.core.config import settings
6
+ from app.core.logging import get_logger, setup_logging
7
+ from app.services.model_registry import get_model_registry
8
+
9
+
10
+ setup_logging()
11
+ logger = get_logger(__name__)
12
+
13
+
14
+ async def main() -> None:
15
+ """Download fusion and submodel repositories into the configured HF cache."""
16
+ logger.info("Starting build-time model prefetch for %s", settings.HF_FUSION_REPO_ID)
17
+ registry = get_model_registry()
18
+ await registry.load_from_fusion_repo(settings.HF_FUSION_REPO_ID, force_reload=True)
19
+ logger.info("Build-time model prefetch completed")
20
+
21
+
22
+ if __name__ == "__main__":
23
+ asyncio.run(main())
app/services/hf_hub_service.py CHANGED
@@ -69,6 +69,11 @@ class HFHubService:
69
  logger.info(f"Downloading repo: {repo_id} (revision={revision}, force={force_download})")
70
 
71
  try:
 
 
 
 
 
72
  # Use local_dir instead of cache_dir to avoid symlink issues on Windows
73
  repo_name = repo_id.replace("/", "--")
74
  local_dir = Path(self.cache_dir) / repo_name
 
69
  logger.info(f"Downloading repo: {repo_id} (revision={revision}, force={force_download})")
70
 
71
  try:
72
+ cached_path = self.get_cached_path(repo_id)
73
+ if cached_path and not force_download:
74
+ logger.info(f"Using cached repo for {repo_id}: {cached_path}")
75
+ return cached_path
76
+
77
  # Use local_dir instead of cache_dir to avoid symlink issues on Windows
78
  repo_name = repo_id.replace("/", "--")
79
  local_dir = Path(self.cache_dir) / repo_name