akagtag commited on
Commit
a38026a
·
1 Parent(s): 8363f67

fix: update Dockerfile dependencies, remove audio processing, and replace models

Browse files
Dockerfile CHANGED
@@ -1,31 +1,31 @@
1
  FROM python:3.11-slim
2
 
3
- RUN apt-get update && apt-get install -y \
4
- ffmpeg libgl1 libglib2.0-0 libsm6 libxext6 libxrender-dev libgles2 libegl1 \
 
 
 
 
 
 
 
 
 
5
  && rm -rf /var/lib/apt/lists/*
6
 
7
  WORKDIR /app
 
8
  COPY requirements.txt .
9
- RUN python - <<'PY'
10
- from pathlib import Path
11
- lines = Path("requirements.txt").read_text(encoding="utf-8").splitlines()
12
- filtered = [
13
- line for line in lines
14
- if not line.strip().startswith("torch>=")
15
- and not line.strip().startswith("torchvision>=")
16
- ]
17
- Path("/tmp/requirements-no-torch.txt").write_text("\n".join(filtered) + "\n", encoding="utf-8")
18
- PY
19
- RUN pip install --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cpu \
20
- torch==2.6.0+cpu torchvision==0.21.0+cpu \
21
- -r /tmp/requirements-no-torch.txt
22
 
23
  COPY . .
24
 
25
  ENV MODEL_CACHE_DIR=/data/models
26
  ENV TOKENIZERS_PARALLELISM=false
 
 
27
  ENV PYTHONUNBUFFERED=1
28
- ENV PYTHONPATH=/app
29
 
30
  EXPOSE 7860
31
  CMD ["python", "spaces/app.py"]
 
1
  FROM python:3.11-slim
2
 
3
+ RUN apt-get update && apt-get install -y --no-install-recommends \
4
+ ffmpeg \
5
+ libgl1 \
6
+ libglib2.0-0 \
7
+ libsm6 \
8
+ libxext6 \
9
+ libxrender1 \
10
+ libgles2 \
11
+ libegl1 \
12
+ libgbm1 \
13
+ libgomp1 \
14
  && rm -rf /var/lib/apt/lists/*
15
 
16
  WORKDIR /app
17
+
18
  COPY requirements.txt .
19
+ RUN pip install --no-cache-dir --upgrade pip && \
20
+ pip install --no-cache-dir -r requirements.txt
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  COPY . .
23
 
24
  ENV MODEL_CACHE_DIR=/data/models
25
  ENV TOKENIZERS_PARALLELISM=false
26
+ ENV MESA_GL_VERSION_OVERRIDE=3.3
27
+ ENV PYOPENGL_PLATFORM=egl
28
  ENV PYTHONUNBUFFERED=1
 
29
 
30
  EXPOSE 7860
31
  CMD ["python", "spaces/app.py"]
FIX.md ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FIX.md — How to Stop All Startup Errors
2
+
3
+ The logs show the OLD engine code is still running. The files from the previous
4
+ session were not copied into the project. Do these steps in order.
5
+
6
+ ---
7
+
8
+ ## Step 1 — Replace fingerprint engine
9
+
10
+ Copy `fingerprint_engine.py` (from outputs) to:
11
+
12
+ ```
13
+ src/engines/fingerprint/engine.py
14
+ ```
15
+
16
+ This removes ALL broken models:
17
+
18
+ - `yermandy/deepfake-detection` — gone
19
+ - `yermandy/GenD_CLIP_L_14` — gone
20
+ - `yermandy/GenD_DINOv3_L` — gone
21
+ - `Wvolf/ViT_Deepfake_Detection` — gone
22
+ - `trust_remote_code` kwarg bug — fixed
23
+
24
+ Replaces with 3 working models:
25
+
26
+ - `Organika/sdxl-detector`
27
+ - `haywoodsloan/ai-image-detector-deploy`
28
+ - `dima806/deepfake_vs_real_image_detection`
29
+
30
+ ---
31
+
32
+ ## Step 2 — Replace coherence engine
33
+
34
+ Copy `coherence_engine.py` (from outputs) to:
35
+
36
+ ```
37
+ src/engines/coherence/engine.py
38
+ ```
39
+
40
+ This removes the broken wav2vec model
41
+ (`nii-yamagishilab/wav2vec-large-anti-deepfake-nda`) which has incompatible
42
+ weights and was producing random output anyway. Coherence now runs visual-only
43
+ (FaceNet + MediaPipe).
44
+
45
+ ---
46
+
47
+ ## Step 3 — Replace SSTGNN engine
48
+
49
+ Copy `sstgnn_engine.py` (from outputs) to:
50
+
51
+ ```
52
+ src/engines/sstgnn/engine.py
53
+ ```
54
+
55
+ Removes `Wvolf/ViT_Deepfake_Detection`. Uses `dima806` + `prithivMLmods` only.
56
+
57
+ ---
58
+
59
+ ## Step 4 — Fix the Dockerfile (libGLESv2 error)
60
+
61
+ Replace your `Dockerfile` with this exactly:
62
+
63
+ ```dockerfile
64
+ FROM python:3.11-slim
65
+
66
+ RUN apt-get update && apt-get install -y --no-install-recommends \
67
+ ffmpeg \
68
+ libgl1 \
69
+ libglib2.0-0 \
70
+ libsm6 \
71
+ libxext6 \
72
+ libxrender1 \
73
+ libgles2 \
74
+ libegl1 \
75
+ libgbm1 \
76
+ libgomp1 \
77
+ && rm -rf /var/lib/apt/lists/*
78
+
79
+ WORKDIR /app
80
+
81
+ COPY requirements.txt .
82
+ RUN pip install --no-cache-dir --upgrade pip && \
83
+ pip install --no-cache-dir -r requirements.txt
84
+
85
+ COPY . .
86
+
87
+ ENV MODEL_CACHE_DIR=/data/models
88
+ ENV TOKENIZERS_PARALLELISM=false
89
+ ENV MESA_GL_VERSION_OVERRIDE=3.3
90
+ ENV PYOPENGL_PLATFORM=egl
91
+ ENV PYTHONUNBUFFERED=1
92
+
93
+ EXPOSE 7860
94
+ CMD ["python", "spaces/app.py"]
95
+ ```
96
+
97
+ The key additions are `libgles2 libegl1 libgbm1` — MediaPipe requires OpenGL ES
98
+ even for CPU-only inference. Without these packages it always throws
99
+ `libGLESv2.so.2: cannot open shared object file`.
100
+
101
+ ---
102
+
103
+ ## Step 5 — Fix requirements.txt (torch CVE block)
104
+
105
+ Replace the torch lines in `requirements.txt`:
106
+
107
+ ```
108
+ torch>=2.6.0
109
+ torchvision>=0.21.0
110
+ torchaudio>=2.6.0
111
+ ```
112
+
113
+ Torch < 2.6 blocks loading `.pt` files due to CVE-2025-32434.
114
+ `Wvolf/ViT_Deepfake_Detection` uses `.pt` — it will NEVER load on torch < 2.6.
115
+ Since you're removing that model anyway, this is a safety measure for other
116
+ models.
117
+
118
+ ---
119
+
120
+ ## Step 6 — Rebuild and redeploy
121
+
122
+ ```bash
123
+ # If running locally / Docker:
124
+ docker build --no-cache -t genai-deepdetect .
125
+ docker run -p 7860:7860 genai-deepdetect
126
+
127
+ # If on HuggingFace Spaces:
128
+ git add src/engines/fingerprint/engine.py
129
+ git add src/engines/coherence/engine.py
130
+ git add src/engines/sstgnn/engine.py
131
+ git add Dockerfile
132
+ git add requirements.txt
133
+ git commit -m "fix: remove broken models, add libgles2 for mediapipe"
134
+ git push
135
+ ```
136
+
137
+ HF Spaces will rebuild the Docker image automatically on push. Watch the build
138
+ logs — the apt-get install should now include libgles2.
139
+
140
+ ---
141
+
142
+ ## What the fixed startup should look like
143
+
144
+ ```
145
+ Fingerprint engine: loading models...
146
+ ✓ detector: Organika/sdxl-detector
147
+ ✓ detector: haywoodsloan/ai-image-detector-deploy
148
+ ✓ detector: dima806/deepfake_vs_real_image_detection
149
+ ✓ CLIP ViT-L/14 loaded for generator attribution
150
+ Fingerprint engine ready: 3 detectors, CLIP=ok
151
+
152
+ Coherence engine: loading models...
153
+ ✓ FaceNet MTCNN + InceptionResnetV1 (VGGFace2) loaded
154
+ ✓ MediaPipe FaceMesh loaded ← only works after Dockerfile fix
155
+ Coherence engine ready: facenet=ok, mediapipe=ok
156
+
157
+ SSTGNN engine: loading models...
158
+ ✓ SSTGNN detector: dima806/deepfake_vs_real_image_detection
159
+ ✓ SSTGNN detector: prithivMLmods/Deep-Fake-Detector-Model
160
+ ✓ MediaPipe FaceMesh loaded for SSTGNN graph
161
+ SSTGNN engine ready: 2 detectors, mediapipe=ok
162
+ ```
163
+
164
+ ---
165
+
166
+ ## Summary
167
+
168
+ | Error | Cause | Fix |
169
+ | --------------------------------- | ------------------------------------------- | --------------------------------- |
170
+ | `yermandy/*` warnings | custom GenD arch | removed from engine |
171
+ | `Wvolf/*` torch CVE error | .pt file + torch < 2.6 | removed from engine |
172
+ | `trust_remote_code` TypeError | duplicate kwarg in \_build_image_classifier | removed from all pipeline() calls |
173
+ | `wav2vec` MISSING/UNEXPECTED keys | custom m_ssl.\* namespace, incompatible | removed from engine |
174
+ | `libGLESv2.so.2 not found` | missing apt packages in Docker | add libgles2 libegl1 libgbm1 |
requirements.txt CHANGED
@@ -12,6 +12,7 @@ transformers>=4.40.0
12
  timm>=1.0.0
13
  torch>=2.6.0
14
  torchvision>=0.21.0
 
15
 
16
  # ML - coherence
17
  # facenet-pytorch currently has limited support on newer Python versions.
 
12
  timm>=1.0.0
13
  torch>=2.6.0
14
  torchvision>=0.21.0
15
+ torchaudio>=2.6.0
16
 
17
  # ML - coherence
18
  # facenet-pytorch currently has limited support on newer Python versions.
runpod_handler.py CHANGED
@@ -16,7 +16,7 @@ from src.engines.fingerprint.engine import FingerprintEngine
16
  from src.engines.sstgnn.engine import SSTGNNEngine
17
  from src.explainability.explainer import explain
18
  from src.fusion.fuser import fuse
19
- from src.services.media_utils import extract_audio_waveform, extract_video_frames
20
 
21
  _fp = FingerprintEngine()
22
  _co = CoherenceEngine()
@@ -47,17 +47,11 @@ def handler(job: dict) -> dict:
47
 
48
  try:
49
  frames = extract_video_frames(tmp_path, max_frames=300)
50
- audio = extract_audio_waveform(tmp_path, sample_rate=16000)
51
  finally:
52
  os.unlink(tmp_path)
53
 
54
- audio_waveform = None
55
- audio_sample_rate = 16000
56
- if audio is not None:
57
- audio_waveform, audio_sample_rate = audio
58
-
59
  fp = _fp.run_video(frames)
60
- co = _co.run_video(frames, audio_waveform, audio_sample_rate)
61
  st = _st.run_video(frames)
62
  verdict, conf, generator = fuse([fp, co, st], is_video=True)
63
 
 
16
  from src.engines.sstgnn.engine import SSTGNNEngine
17
  from src.explainability.explainer import explain
18
  from src.fusion.fuser import fuse
19
+ from src.services.media_utils import extract_video_frames
20
 
21
  _fp = FingerprintEngine()
22
  _co = CoherenceEngine()
 
47
 
48
  try:
49
  frames = extract_video_frames(tmp_path, max_frames=300)
 
50
  finally:
51
  os.unlink(tmp_path)
52
 
 
 
 
 
 
53
  fp = _fp.run_video(frames)
54
+ co = _co.run_video(frames)
55
  st = _st.run_video(frames)
56
  verdict, conf, generator = fuse([fp, co, st], is_video=True)
57
 
src/api/main.py CHANGED
@@ -27,7 +27,7 @@ from src.services.inference_router import (
27
  is_runpod_configured,
28
  route_inference,
29
  )
30
- from src.services.media_utils import extract_audio_waveform, extract_video_frames
31
  from src.types import DetectionResponse, EngineResult
32
 
33
  logger = logging.getLogger(__name__)
@@ -93,10 +93,7 @@ def _model_inventory() -> dict[str, object]:
93
  "attribution_model": "openai/clip-vit-large-patch14",
94
  },
95
  "coherence": {
96
- "audio_deepfake_model": os.environ.get(
97
- "COHERENCE_AUDIO_MODEL_ID",
98
- "",
99
- ),
100
  "facial_landmarks": "mediapipe FaceMesh/FaceLandmarker",
101
  "temporal_embedding": "facenet-pytorch InceptionResnetV1(vggface2) when available",
102
  },
@@ -391,9 +388,7 @@ async def detect_video(file: UploadFile = File(...)) -> DetectionResponse:
391
  tmp_path = tmp.name
392
 
393
  try:
394
- frames_task = asyncio.to_thread(extract_video_frames, tmp_path, MAX_FRAMES)
395
- audio_task = asyncio.to_thread(extract_audio_waveform, tmp_path, 16000)
396
- frames, audio = await asyncio.gather(frames_task, audio_task)
397
  finally:
398
  Path(tmp_path).unlink(missing_ok=True)
399
 
@@ -401,14 +396,9 @@ async def detect_video(file: UploadFile = File(...)) -> DetectionResponse:
401
  raise HTTPException(status_code=422, detail="Could not extract frames")
402
 
403
  await _ensure_models_loaded()
404
- audio_waveform = None
405
- audio_sample_rate = 16000
406
- if audio is not None:
407
- audio_waveform, audio_sample_rate = audio
408
-
409
  fp, co, st = await asyncio.gather(
410
  asyncio.to_thread(_fp.run_video, frames),
411
- asyncio.to_thread(_co.run_video, frames, audio_waveform, audio_sample_rate),
412
  asyncio.to_thread(_st.run_video, frames),
413
  )
414
 
 
27
  is_runpod_configured,
28
  route_inference,
29
  )
30
+ from src.services.media_utils import extract_video_frames
31
  from src.types import DetectionResponse, EngineResult
32
 
33
  logger = logging.getLogger(__name__)
 
93
  "attribution_model": "openai/clip-vit-large-patch14",
94
  },
95
  "coherence": {
96
+ "audio_deepfake_model": "disabled (visual-only coherence)",
 
 
 
97
  "facial_landmarks": "mediapipe FaceMesh/FaceLandmarker",
98
  "temporal_embedding": "facenet-pytorch InceptionResnetV1(vggface2) when available",
99
  },
 
388
  tmp_path = tmp.name
389
 
390
  try:
391
+ frames = await asyncio.to_thread(extract_video_frames, tmp_path, MAX_FRAMES)
 
 
392
  finally:
393
  Path(tmp_path).unlink(missing_ok=True)
394
 
 
396
  raise HTTPException(status_code=422, detail="Could not extract frames")
397
 
398
  await _ensure_models_loaded()
 
 
 
 
 
399
  fp, co, st = await asyncio.gather(
400
  asyncio.to_thread(_fp.run_video, frames),
401
+ asyncio.to_thread(_co.run_video, frames),
402
  asyncio.to_thread(_st.run_video, frames),
403
  )
404
 
src/engines/coherence/detector.py CHANGED
@@ -9,7 +9,7 @@ import tempfile
9
  import numpy as np
10
 
11
  from src.types import EngineResult
12
- from src.services.media_utils import extract_audio_waveform, extract_video_frames
13
 
14
  from .engine import CoherenceEngine
15
 
@@ -18,26 +18,21 @@ class CoherenceDetector(CoherenceEngine):
18
  threshold = 0.5
19
 
20
  def detect_bytes(self, video_bytes: bytes) -> EngineResult:
21
- frames, audio_waveform, audio_sample_rate = self._extract_video_media(video_bytes)
22
  if not frames:
23
  return self._error_result(0.0)
24
  try:
25
- return self.run_video(frames, audio_waveform, audio_sample_rate)
26
  except Exception:
27
  return self._error_result(0.0)
28
 
29
- def _extract_video_media(self, video_bytes: bytes) -> tuple[list[np.ndarray], np.ndarray | None, int]:
30
  with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as tmp:
31
  tmp.write(video_bytes)
32
  tmp_path = tmp.name
33
 
34
  try:
35
- frames = extract_video_frames(tmp_path, max_frames=64)
36
- audio = extract_audio_waveform(tmp_path, sample_rate=16000)
37
- if audio is None:
38
- return frames, None, 16000
39
- waveform, sample_rate = audio
40
- return frames, waveform, sample_rate
41
  finally:
42
  os.unlink(tmp_path)
43
 
 
9
  import numpy as np
10
 
11
  from src.types import EngineResult
12
+ from src.services.media_utils import extract_video_frames
13
 
14
  from .engine import CoherenceEngine
15
 
 
18
  threshold = 0.5
19
 
20
  def detect_bytes(self, video_bytes: bytes) -> EngineResult:
21
+ frames = self._extract_video_frames(video_bytes)
22
  if not frames:
23
  return self._error_result(0.0)
24
  try:
25
+ return self.run_video(frames)
26
  except Exception:
27
  return self._error_result(0.0)
28
 
29
+ def _extract_video_frames(self, video_bytes: bytes) -> list[np.ndarray]:
30
  with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as tmp:
31
  tmp.write(video_bytes)
32
  tmp_path = tmp.name
33
 
34
  try:
35
+ return extract_video_frames(tmp_path, max_frames=64)
 
 
 
 
 
36
  finally:
37
  os.unlink(tmp_path)
38
 
src/engines/coherence/engine.py CHANGED
@@ -6,7 +6,6 @@ import threading
6
  import time
7
  import urllib.request
8
  from pathlib import Path
9
- from typing import Any
10
 
11
  import numpy as np
12
  from PIL import Image
@@ -21,8 +20,6 @@ _mtcnn = None
21
  _resnet = None
22
  _face_mesh = None
23
  _torch = None
24
- _audio_detector = None
25
- _DEFAULT_AUDIO_MODEL_ID = ""
26
 
27
 
28
  def _skip_model_loads() -> bool:
@@ -34,14 +31,6 @@ def _skip_model_loads() -> bool:
34
  }
35
 
36
 
37
- def _get_pipeline():
38
- try:
39
- from transformers import pipeline as hf_pipeline # type: ignore
40
- except Exception:
41
- from transformers.pipelines import pipeline as hf_pipeline # type: ignore
42
- return hf_pipeline
43
-
44
-
45
  def _short_error(exc: Exception, *, limit: int = 300) -> str:
46
  message = " ".join(str(exc).strip().split())
47
  if len(message) > limit:
@@ -97,6 +86,7 @@ def _build_face_mesh():
97
  max_num_faces=1,
98
  refine_landmarks=True,
99
  min_detection_confidence=0.5,
 
100
  )
101
 
102
  from mediapipe.tasks import python as mp_tasks_python # type: ignore
@@ -112,33 +102,8 @@ def _build_face_mesh():
112
  return _TasksFaceMeshAdapter(mp, landmarker)
113
 
114
 
115
- def _build_audio_classifier(model_id: str) -> Any:
116
- pipeline = _get_pipeline()
117
-
118
- cache_dir = os.environ.get("MODEL_CACHE_DIR", "/tmp/models")
119
- attempts = (
120
- {"trust_remote_code": True, "model_kwargs": {"cache_dir": cache_dir}},
121
- {"trust_remote_code": True},
122
- {"model_kwargs": {"cache_dir": cache_dir}},
123
- {},
124
- )
125
- last_exc: Exception | None = None
126
- for kwargs in attempts:
127
- try:
128
- return pipeline(
129
- "audio-classification",
130
- model=model_id,
131
- **kwargs,
132
- )
133
- except Exception as exc:
134
- last_exc = exc
135
- if last_exc is not None:
136
- raise last_exc
137
- raise RuntimeError(f"Unable to load audio-classification pipeline for {model_id}")
138
-
139
-
140
  def _load() -> None:
141
- global _mtcnn, _resnet, _face_mesh, _load_attempted, _torch, _audio_detector
142
  if _load_attempted:
143
  return
144
 
@@ -173,15 +138,6 @@ def _load() -> None:
173
  _short_error(exc),
174
  )
175
 
176
- model_id = os.environ.get("COHERENCE_AUDIO_MODEL_ID", _DEFAULT_AUDIO_MODEL_ID).strip()
177
- if not model_id:
178
- logger.info("Coherence audio model disabled (set COHERENCE_AUDIO_MODEL_ID to enable).")
179
- else:
180
- try:
181
- _audio_detector = _build_audio_classifier(model_id)
182
- except Exception as exc:
183
- logger.warning("Coherence audio model unavailable (%s): %s", model_id, _short_error(exc))
184
-
185
  logger.info("Coherence model load attempt complete")
186
 
187
 
@@ -238,12 +194,7 @@ class CoherenceEngine:
238
  logger.warning("Coherence image scoring failed: %s", exc)
239
  return 0.35
240
 
241
- def run_video(
242
- self,
243
- frames: list[np.ndarray],
244
- audio_waveform: np.ndarray | None = None,
245
- audio_sample_rate: int = 16000,
246
- ) -> EngineResult:
247
  t0 = time.perf_counter()
248
  self._ensure()
249
 
@@ -265,8 +216,7 @@ class CoherenceEngine:
265
  delta = self._embedding_variance(frames)
266
  jerk = self._landmark_jerk(frames)
267
  blink = self._blink_anomaly(frames)
268
- audio = self._audio_deepfake_score(audio_waveform, audio_sample_rate)
269
- score = float(np.clip(delta * 0.35 + jerk * 0.30 + blink * 0.15 + audio * 0.20, 0.0, 1.0))
270
 
271
  return EngineResult(
272
  engine="coherence",
@@ -276,50 +226,11 @@ class CoherenceEngine:
276
  explanation=(
277
  f"Embedding variance {delta:.2f}, "
278
  f"landmark jerk {jerk:.2f}, "
279
- f"blink anomaly {blink:.2f}, "
280
- f"audio deepfake score {audio:.2f}."
281
  ),
282
  processing_time_ms=(time.perf_counter() - t0) * 1000,
283
  )
284
 
285
- def _audio_deepfake_score(self, waveform: np.ndarray | None = None, sample_rate: int = 16000) -> float:
286
- if _audio_detector is None:
287
- return 0.5
288
- if waveform is None or waveform.size == 0:
289
- return 0.5
290
-
291
- max_seconds = int(os.environ.get("COHERENCE_AUDIO_MAX_SECONDS", "30"))
292
- max_samples = max(16000, sample_rate * max_seconds)
293
- if waveform.size > max_samples:
294
- waveform = waveform[:max_samples]
295
-
296
- try:
297
- preds = _audio_detector(
298
- {"array": waveform.astype(np.float32), "sampling_rate": sample_rate},
299
- top_k=5,
300
- )
301
- except Exception:
302
- return 0.5
303
-
304
- if isinstance(preds, dict):
305
- preds = [preds]
306
- if preds and isinstance(preds[0], list):
307
- preds = preds[0]
308
- if not preds:
309
- return 0.5
310
-
311
- fake_keywords = ("spoof", "fake", "deepfake", "synthetic", "generated")
312
- best = 0.0
313
- for pred in preds:
314
- label = str(pred.get("label", "")).lower()
315
- score = float(pred.get("score", 0.0))
316
- if any(keyword in label for keyword in fake_keywords):
317
- best = max(best, score)
318
-
319
- if best == 0.0:
320
- return 0.5
321
- return float(np.clip(best, 0.0, 1.0))
322
-
323
  def _embedding_variance(self, frames: list[np.ndarray]) -> float:
324
  if _mtcnn is None or _resnet is None or _torch is None:
325
  return 0.5
 
6
  import time
7
  import urllib.request
8
  from pathlib import Path
 
9
 
10
  import numpy as np
11
  from PIL import Image
 
20
  _resnet = None
21
  _face_mesh = None
22
  _torch = None
 
 
23
 
24
 
25
  def _skip_model_loads() -> bool:
 
31
  }
32
 
33
 
 
 
 
 
 
 
 
 
34
  def _short_error(exc: Exception, *, limit: int = 300) -> str:
35
  message = " ".join(str(exc).strip().split())
36
  if len(message) > limit:
 
86
  max_num_faces=1,
87
  refine_landmarks=True,
88
  min_detection_confidence=0.5,
89
+ min_tracking_confidence=0.5,
90
  )
91
 
92
  from mediapipe.tasks import python as mp_tasks_python # type: ignore
 
102
  return _TasksFaceMeshAdapter(mp, landmarker)
103
 
104
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
  def _load() -> None:
106
+ global _mtcnn, _resnet, _face_mesh, _load_attempted, _torch
107
  if _load_attempted:
108
  return
109
 
 
138
  _short_error(exc),
139
  )
140
 
 
 
 
 
 
 
 
 
 
141
  logger.info("Coherence model load attempt complete")
142
 
143
 
 
194
  logger.warning("Coherence image scoring failed: %s", exc)
195
  return 0.35
196
 
197
+ def run_video(self, frames: list[np.ndarray]) -> EngineResult:
 
 
 
 
 
198
  t0 = time.perf_counter()
199
  self._ensure()
200
 
 
216
  delta = self._embedding_variance(frames)
217
  jerk = self._landmark_jerk(frames)
218
  blink = self._blink_anomaly(frames)
219
+ score = float(np.clip(delta * 0.45 + jerk * 0.35 + blink * 0.20, 0.0, 1.0))
 
220
 
221
  return EngineResult(
222
  engine="coherence",
 
226
  explanation=(
227
  f"Embedding variance {delta:.2f}, "
228
  f"landmark jerk {jerk:.2f}, "
229
+ f"blink anomaly {blink:.2f}."
 
230
  ),
231
  processing_time_ms=(time.perf_counter() - t0) * 1000,
232
  )
233
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
  def _embedding_variance(self, frames: list[np.ndarray]) -> float:
235
  if _mtcnn is None or _resnet is None or _torch is None:
236
  return 0.5
src/services/hf_inference_client.py CHANGED
@@ -22,7 +22,7 @@ import httpx
22
  logger = logging.getLogger(__name__)
23
 
24
  _HF_API_BASE = "https://api-inference.huggingface.co"
25
- _DEFAULT_MODEL = "Wvolf/ViT_Deepfake_Detection"
26
 
27
 
28
  class HFInferenceUnavailable(RuntimeError):
 
22
  logger = logging.getLogger(__name__)
23
 
24
  _HF_API_BASE = "https://api-inference.huggingface.co"
25
+ _DEFAULT_MODEL = "dima806/deepfake_vs_real_image_detection"
26
 
27
 
28
  class HFInferenceUnavailable(RuntimeError):
src/services/runpod_client.py CHANGED
@@ -45,7 +45,7 @@ class RunPodClient:
45
  def __init__(self) -> None:
46
  self._api_key = os.environ.get("RUNPOD_API_KEY", "")
47
  self._endpoint_id = os.environ.get("RUNPOD_ENDPOINT_ID", "")
48
- self._model_id = os.environ.get("RUNPOD_MODEL_ID", "Wvolf/ViT_Deepfake_Detection")
49
 
50
  @property
51
  def available(self) -> bool:
 
45
  def __init__(self) -> None:
46
  self._api_key = os.environ.get("RUNPOD_API_KEY", "")
47
  self._endpoint_id = os.environ.get("RUNPOD_ENDPOINT_ID", "")
48
+ self._model_id = os.environ.get("RUNPOD_MODEL_ID", "dima806/deepfake_vs_real_image_detection")
49
 
50
  @property
51
  def available(self) -> bool: