Spaces:

akagtag
/

deepdetection

Sleeping

App Files Files Community

akagtag commited on 29 days ago

Commit

a38026a

1 Parent(s): 8363f67

fix: update Dockerfile dependencies, remove audio processing, and replace models

Browse files

Files changed (9) hide show

Dockerfile +16 -16
FIX.md +174 -0
requirements.txt +1 -0
runpod_handler.py +2 -8
src/api/main.py +4 -14
src/engines/coherence/detector.py +5 -10
src/engines/coherence/engine.py +5 -94
src/services/hf_inference_client.py +1 -1
src/services/runpod_client.py +1 -1

Dockerfile CHANGED Viewed

@@ -1,31 +1,31 @@
 FROM python:3.11-slim
-RUN apt-get update && apt-get install -y \
-    ffmpeg libgl1 libglib2.0-0 libsm6 libxext6 libxrender-dev libgles2 libegl1 \
     && rm -rf /var/lib/apt/lists/*
 WORKDIR /app
 COPY requirements.txt .
-RUN python - <<'PY'
-from pathlib import Path
-lines = Path("requirements.txt").read_text(encoding="utf-8").splitlines()
-filtered = [
-    line for line in lines
-    if not line.strip().startswith("torch>=")
-    and not line.strip().startswith("torchvision>=")
-]
-Path("/tmp/requirements-no-torch.txt").write_text("\n".join(filtered) + "\n", encoding="utf-8")
-PY
-RUN pip install --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cpu \
-    torch==2.6.0+cpu torchvision==0.21.0+cpu \
-    -r /tmp/requirements-no-torch.txt
 COPY . .
 ENV MODEL_CACHE_DIR=/data/models
 ENV TOKENIZERS_PARALLELISM=false
 ENV PYTHONUNBUFFERED=1
-ENV PYTHONPATH=/app
 EXPOSE 7860
 CMD ["python", "spaces/app.py"]

 FROM python:3.11-slim
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    libgl1 \
+    libglib2.0-0 \
+    libsm6 \
+    libxext6 \
+    libxrender1 \
+    libgles2 \
+    libegl1 \
+    libgbm1 \
+    libgomp1 \
     && rm -rf /var/lib/apt/lists/*
 WORKDIR /app
 COPY requirements.txt .
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r requirements.txt
 COPY . .
 ENV MODEL_CACHE_DIR=/data/models
 ENV TOKENIZERS_PARALLELISM=false
+ENV MESA_GL_VERSION_OVERRIDE=3.3
+ENV PYOPENGL_PLATFORM=egl
 ENV PYTHONUNBUFFERED=1
 EXPOSE 7860
 CMD ["python", "spaces/app.py"]

FIX.md ADDED Viewed

	@@ -0,0 +1,174 @@

+# FIX.md — How to Stop All Startup Errors
+The logs show the OLD engine code is still running. The files from the previous
+session were not copied into the project. Do these steps in order.
+---
+## Step 1 — Replace fingerprint engine
+Copy `fingerprint_engine.py` (from outputs) to:
+```
+src/engines/fingerprint/engine.py
+```
+This removes ALL broken models:
+- `yermandy/deepfake-detection` — gone
+- `yermandy/GenD_CLIP_L_14` — gone
+- `yermandy/GenD_DINOv3_L` — gone
+- `Wvolf/ViT_Deepfake_Detection` — gone
+- `trust_remote_code` kwarg bug — fixed
+Replaces with 3 working models:
+- `Organika/sdxl-detector`
+- `haywoodsloan/ai-image-detector-deploy`
+- `dima806/deepfake_vs_real_image_detection`
+---
+## Step 2 — Replace coherence engine
+Copy `coherence_engine.py` (from outputs) to:
+```
+src/engines/coherence/engine.py
+```
+This removes the broken wav2vec model
+(`nii-yamagishilab/wav2vec-large-anti-deepfake-nda`) which has incompatible
+weights and was producing random output anyway. Coherence now runs visual-only
+(FaceNet + MediaPipe).
+---
+## Step 3 — Replace SSTGNN engine
+Copy `sstgnn_engine.py` (from outputs) to:
+```
+src/engines/sstgnn/engine.py
+```
+Removes `Wvolf/ViT_Deepfake_Detection`. Uses `dima806` + `prithivMLmods` only.
+---
+## Step 4 — Fix the Dockerfile (libGLESv2 error)
+Replace your `Dockerfile` with this exactly:
+```dockerfile
+FROM python:3.11-slim
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    libgl1 \
+    libglib2.0-0 \
+    libsm6 \
+    libxext6 \
+    libxrender1 \
+    libgles2 \
+    libegl1 \
+    libgbm1 \
+    libgomp1 \
+    && rm -rf /var/lib/apt/lists/*
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r requirements.txt
+COPY . .
+ENV MODEL_CACHE_DIR=/data/models
+ENV TOKENIZERS_PARALLELISM=false
+ENV MESA_GL_VERSION_OVERRIDE=3.3
+ENV PYOPENGL_PLATFORM=egl
+ENV PYTHONUNBUFFERED=1
+EXPOSE 7860
+CMD ["python", "spaces/app.py"]
+```
+The key additions are `libgles2 libegl1 libgbm1` — MediaPipe requires OpenGL ES
+even for CPU-only inference. Without these packages it always throws
+`libGLESv2.so.2: cannot open shared object file`.
+---
+## Step 5 — Fix requirements.txt (torch CVE block)
+Replace the torch lines in `requirements.txt`:
+```
+torch>=2.6.0
+torchvision>=0.21.0
+torchaudio>=2.6.0
+```
+Torch < 2.6 blocks loading `.pt` files due to CVE-2025-32434.
+`Wvolf/ViT_Deepfake_Detection` uses `.pt` — it will NEVER load on torch < 2.6.
+Since you're removing that model anyway, this is a safety measure for other
+models.
+---
+## Step 6 — Rebuild and redeploy
+```bash
+# If running locally / Docker:
+docker build --no-cache -t genai-deepdetect .
+docker run -p 7860:7860 genai-deepdetect
+# If on HuggingFace Spaces:
+git add src/engines/fingerprint/engine.py
+git add src/engines/coherence/engine.py
+git add src/engines/sstgnn/engine.py
+git add Dockerfile
+git add requirements.txt
+git commit -m "fix: remove broken models, add libgles2 for mediapipe"
+git push
+```
+HF Spaces will rebuild the Docker image automatically on push. Watch the build
+logs — the apt-get install should now include libgles2.
+---
+## What the fixed startup should look like
+```
+Fingerprint engine: loading models...
+  ✓ detector: Organika/sdxl-detector
+  ✓ detector: haywoodsloan/ai-image-detector-deploy
+  ✓ detector: dima806/deepfake_vs_real_image_detection
+  ✓ CLIP ViT-L/14 loaded for generator attribution
+Fingerprint engine ready: 3 detectors, CLIP=ok
+Coherence engine: loading models...
+  ✓ FaceNet MTCNN + InceptionResnetV1 (VGGFace2) loaded
+  ✓ MediaPipe FaceMesh loaded          ← only works after Dockerfile fix
+Coherence engine ready: facenet=ok, mediapipe=ok
+SSTGNN engine: loading models...
+  ✓ SSTGNN detector: dima806/deepfake_vs_real_image_detection
+  ✓ SSTGNN detector: prithivMLmods/Deep-Fake-Detector-Model
+  ✓ MediaPipe FaceMesh loaded for SSTGNN graph
+SSTGNN engine ready: 2 detectors, mediapipe=ok
+```
+---
+## Summary
+| Error                             | Cause                                       | Fix                               |
+| --------------------------------- | ------------------------------------------- | --------------------------------- |
+| `yermandy/*` warnings             | custom GenD arch                            | removed from engine               |
+| `Wvolf/*` torch CVE error         | .pt file + torch < 2.6                      | removed from engine               |
+| `trust_remote_code` TypeError     | duplicate kwarg in \_build_image_classifier | removed from all pipeline() calls |
+| `wav2vec` MISSING/UNEXPECTED keys | custom m_ssl.\* namespace, incompatible     | removed from engine               |
+| `libGLESv2.so.2 not found`        | missing apt packages in Docker              | add libgles2 libegl1 libgbm1      |

requirements.txt CHANGED Viewed

@@ -12,6 +12,7 @@ transformers>=4.40.0
 timm>=1.0.0
 torch>=2.6.0
 torchvision>=0.21.0
 # ML - coherence
 # facenet-pytorch currently has limited support on newer Python versions.

 timm>=1.0.0
 torch>=2.6.0
 torchvision>=0.21.0
+torchaudio>=2.6.0
 # ML - coherence
 # facenet-pytorch currently has limited support on newer Python versions.

runpod_handler.py CHANGED Viewed

@@ -16,7 +16,7 @@ from src.engines.fingerprint.engine import FingerprintEngine
 from src.engines.sstgnn.engine import SSTGNNEngine
 from src.explainability.explainer import explain
 from src.fusion.fuser import fuse
-from src.services.media_utils import extract_audio_waveform, extract_video_frames
 _fp = FingerprintEngine()
 _co = CoherenceEngine()
@@ -47,17 +47,11 @@ def handler(job: dict) -> dict:
         try:
             frames = extract_video_frames(tmp_path, max_frames=300)
-            audio = extract_audio_waveform(tmp_path, sample_rate=16000)
         finally:
             os.unlink(tmp_path)
-        audio_waveform = None
-        audio_sample_rate = 16000
-        if audio is not None:
-            audio_waveform, audio_sample_rate = audio
         fp = _fp.run_video(frames)
-        co = _co.run_video(frames, audio_waveform, audio_sample_rate)
         st = _st.run_video(frames)
         verdict, conf, generator = fuse([fp, co, st], is_video=True)

 from src.engines.sstgnn.engine import SSTGNNEngine
 from src.explainability.explainer import explain
 from src.fusion.fuser import fuse
+from src.services.media_utils import extract_video_frames
 _fp = FingerprintEngine()
 _co = CoherenceEngine()
         try:
             frames = extract_video_frames(tmp_path, max_frames=300)
         finally:
             os.unlink(tmp_path)
         fp = _fp.run_video(frames)
+        co = _co.run_video(frames)
         st = _st.run_video(frames)
         verdict, conf, generator = fuse([fp, co, st], is_video=True)

src/api/main.py CHANGED Viewed

@@ -27,7 +27,7 @@ from src.services.inference_router import (
     is_runpod_configured,
     route_inference,
 )
-from src.services.media_utils import extract_audio_waveform, extract_video_frames
 from src.types import DetectionResponse, EngineResult
 logger = logging.getLogger(__name__)
@@ -93,10 +93,7 @@ def _model_inventory() -> dict[str, object]:
             "attribution_model": "openai/clip-vit-large-patch14",
         },
         "coherence": {
-            "audio_deepfake_model": os.environ.get(
-                "COHERENCE_AUDIO_MODEL_ID",
-                "",
-            ),
             "facial_landmarks": "mediapipe FaceMesh/FaceLandmarker",
             "temporal_embedding": "facenet-pytorch InceptionResnetV1(vggface2) when available",
         },
@@ -391,9 +388,7 @@ async def detect_video(file: UploadFile = File(...)) -> DetectionResponse:
         tmp_path = tmp.name
     try:
-        frames_task = asyncio.to_thread(extract_video_frames, tmp_path, MAX_FRAMES)
-        audio_task = asyncio.to_thread(extract_audio_waveform, tmp_path, 16000)
-        frames, audio = await asyncio.gather(frames_task, audio_task)
     finally:
         Path(tmp_path).unlink(missing_ok=True)
@@ -401,14 +396,9 @@ async def detect_video(file: UploadFile = File(...)) -> DetectionResponse:
         raise HTTPException(status_code=422, detail="Could not extract frames")
     await _ensure_models_loaded()
-    audio_waveform = None
-    audio_sample_rate = 16000
-    if audio is not None:
-        audio_waveform, audio_sample_rate = audio
     fp, co, st = await asyncio.gather(
         asyncio.to_thread(_fp.run_video, frames),
-        asyncio.to_thread(_co.run_video, frames, audio_waveform, audio_sample_rate),
         asyncio.to_thread(_st.run_video, frames),
     )

     is_runpod_configured,
     route_inference,
 )
+from src.services.media_utils import extract_video_frames
 from src.types import DetectionResponse, EngineResult
 logger = logging.getLogger(__name__)
             "attribution_model": "openai/clip-vit-large-patch14",
         },
         "coherence": {
+            "audio_deepfake_model": "disabled (visual-only coherence)",
             "facial_landmarks": "mediapipe FaceMesh/FaceLandmarker",
             "temporal_embedding": "facenet-pytorch InceptionResnetV1(vggface2) when available",
         },
         tmp_path = tmp.name
     try:
+        frames = await asyncio.to_thread(extract_video_frames, tmp_path, MAX_FRAMES)
     finally:
         Path(tmp_path).unlink(missing_ok=True)
         raise HTTPException(status_code=422, detail="Could not extract frames")
     await _ensure_models_loaded()
     fp, co, st = await asyncio.gather(
         asyncio.to_thread(_fp.run_video, frames),
+        asyncio.to_thread(_co.run_video, frames),
         asyncio.to_thread(_st.run_video, frames),
     )

src/engines/coherence/detector.py CHANGED Viewed

@@ -9,7 +9,7 @@ import tempfile
 import numpy as np
 from src.types import EngineResult
-from src.services.media_utils import extract_audio_waveform, extract_video_frames
 from .engine import CoherenceEngine
@@ -18,26 +18,21 @@ class CoherenceDetector(CoherenceEngine):
     threshold = 0.5
     def detect_bytes(self, video_bytes: bytes) -> EngineResult:
-        frames, audio_waveform, audio_sample_rate = self._extract_video_media(video_bytes)
         if not frames:
             return self._error_result(0.0)
         try:
-            return self.run_video(frames, audio_waveform, audio_sample_rate)
         except Exception:
             return self._error_result(0.0)
-    def _extract_video_media(self, video_bytes: bytes) -> tuple[list[np.ndarray], np.ndarray | None, int]:
         with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as tmp:
             tmp.write(video_bytes)
             tmp_path = tmp.name
         try:
-            frames = extract_video_frames(tmp_path, max_frames=64)
-            audio = extract_audio_waveform(tmp_path, sample_rate=16000)
-            if audio is None:
-                return frames, None, 16000
-            waveform, sample_rate = audio
-            return frames, waveform, sample_rate
         finally:
             os.unlink(tmp_path)

 import numpy as np
 from src.types import EngineResult
+from src.services.media_utils import extract_video_frames
 from .engine import CoherenceEngine
     threshold = 0.5
     def detect_bytes(self, video_bytes: bytes) -> EngineResult:
+        frames = self._extract_video_frames(video_bytes)
         if not frames:
             return self._error_result(0.0)
         try:
+            return self.run_video(frames)
         except Exception:
             return self._error_result(0.0)
+    def _extract_video_frames(self, video_bytes: bytes) -> list[np.ndarray]:
         with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as tmp:
             tmp.write(video_bytes)
             tmp_path = tmp.name
         try:
+            return extract_video_frames(tmp_path, max_frames=64)
         finally:
             os.unlink(tmp_path)

src/engines/coherence/engine.py CHANGED Viewed

@@ -6,7 +6,6 @@ import threading
 import time
 import urllib.request
 from pathlib import Path
-from typing import Any
 import numpy as np
 from PIL import Image
@@ -21,8 +20,6 @@ _mtcnn = None
 _resnet = None
 _face_mesh = None
 _torch = None
-_audio_detector = None
-_DEFAULT_AUDIO_MODEL_ID = ""
 def _skip_model_loads() -> bool:
@@ -34,14 +31,6 @@ def _skip_model_loads() -> bool:
     }
-def _get_pipeline():
-    try:
-        from transformers import pipeline as hf_pipeline  # type: ignore
-    except Exception:
-        from transformers.pipelines import pipeline as hf_pipeline  # type: ignore
-    return hf_pipeline
 def _short_error(exc: Exception, *, limit: int = 300) -> str:
     message = " ".join(str(exc).strip().split())
     if len(message) > limit:
@@ -97,6 +86,7 @@ def _build_face_mesh():
             max_num_faces=1,
             refine_landmarks=True,
             min_detection_confidence=0.5,
         )
     from mediapipe.tasks import python as mp_tasks_python  # type: ignore
@@ -112,33 +102,8 @@ def _build_face_mesh():
     return _TasksFaceMeshAdapter(mp, landmarker)
-def _build_audio_classifier(model_id: str) -> Any:
-    pipeline = _get_pipeline()
-    cache_dir = os.environ.get("MODEL_CACHE_DIR", "/tmp/models")
-    attempts = (
-        {"trust_remote_code": True, "model_kwargs": {"cache_dir": cache_dir}},
-        {"trust_remote_code": True},
-        {"model_kwargs": {"cache_dir": cache_dir}},
-        {},
-    )
-    last_exc: Exception | None = None
-    for kwargs in attempts:
-        try:
-            return pipeline(
-                "audio-classification",
-                model=model_id,
-                **kwargs,
-            )
-        except Exception as exc:
-            last_exc = exc
-    if last_exc is not None:
-        raise last_exc
-    raise RuntimeError(f"Unable to load audio-classification pipeline for {model_id}")
 def _load() -> None:
-    global _mtcnn, _resnet, _face_mesh, _load_attempted, _torch, _audio_detector
     if _load_attempted:
         return
@@ -173,15 +138,6 @@ def _load() -> None:
             _short_error(exc),
         )
-    model_id = os.environ.get("COHERENCE_AUDIO_MODEL_ID", _DEFAULT_AUDIO_MODEL_ID).strip()
-    if not model_id:
-        logger.info("Coherence audio model disabled (set COHERENCE_AUDIO_MODEL_ID to enable).")
-    else:
-        try:
-            _audio_detector = _build_audio_classifier(model_id)
-        except Exception as exc:
-            logger.warning("Coherence audio model unavailable (%s): %s", model_id, _short_error(exc))
     logger.info("Coherence model load attempt complete")
@@ -238,12 +194,7 @@ class CoherenceEngine:
             logger.warning("Coherence image scoring failed: %s", exc)
             return 0.35
-    def run_video(
-        self,
-        frames: list[np.ndarray],
-        audio_waveform: np.ndarray | None = None,
-        audio_sample_rate: int = 16000,
-    ) -> EngineResult:
         t0 = time.perf_counter()
         self._ensure()
@@ -265,8 +216,7 @@ class CoherenceEngine:
         delta = self._embedding_variance(frames)
         jerk = self._landmark_jerk(frames)
         blink = self._blink_anomaly(frames)
-        audio = self._audio_deepfake_score(audio_waveform, audio_sample_rate)
-        score = float(np.clip(delta * 0.35 + jerk * 0.30 + blink * 0.15 + audio * 0.20, 0.0, 1.0))
         return EngineResult(
             engine="coherence",
@@ -276,50 +226,11 @@ class CoherenceEngine:
             explanation=(
                 f"Embedding variance {delta:.2f}, "
                 f"landmark jerk {jerk:.2f}, "
-                f"blink anomaly {blink:.2f}, "
-                f"audio deepfake score {audio:.2f}."
             ),
             processing_time_ms=(time.perf_counter() - t0) * 1000,
         )
-    def _audio_deepfake_score(self, waveform: np.ndarray | None = None, sample_rate: int = 16000) -> float:
-        if _audio_detector is None:
-            return 0.5
-        if waveform is None or waveform.size == 0:
-            return 0.5
-        max_seconds = int(os.environ.get("COHERENCE_AUDIO_MAX_SECONDS", "30"))
-        max_samples = max(16000, sample_rate * max_seconds)
-        if waveform.size > max_samples:
-            waveform = waveform[:max_samples]
-        try:
-            preds = _audio_detector(
-                {"array": waveform.astype(np.float32), "sampling_rate": sample_rate},
-                top_k=5,
-            )
-        except Exception:
-            return 0.5
-        if isinstance(preds, dict):
-            preds = [preds]
-        if preds and isinstance(preds[0], list):
-            preds = preds[0]
-        if not preds:
-            return 0.5
-        fake_keywords = ("spoof", "fake", "deepfake", "synthetic", "generated")
-        best = 0.0
-        for pred in preds:
-            label = str(pred.get("label", "")).lower()
-            score = float(pred.get("score", 0.0))
-            if any(keyword in label for keyword in fake_keywords):
-                best = max(best, score)
-        if best == 0.0:
-            return 0.5
-        return float(np.clip(best, 0.0, 1.0))
     def _embedding_variance(self, frames: list[np.ndarray]) -> float:
         if _mtcnn is None or _resnet is None or _torch is None:
             return 0.5

 import time
 import urllib.request
 from pathlib import Path
 import numpy as np
 from PIL import Image
 _resnet = None
 _face_mesh = None
 _torch = None
 def _skip_model_loads() -> bool:
     }
 def _short_error(exc: Exception, *, limit: int = 300) -> str:
     message = " ".join(str(exc).strip().split())
     if len(message) > limit:
             max_num_faces=1,
             refine_landmarks=True,
             min_detection_confidence=0.5,
+            min_tracking_confidence=0.5,
         )
     from mediapipe.tasks import python as mp_tasks_python  # type: ignore
     return _TasksFaceMeshAdapter(mp, landmarker)
 def _load() -> None:
+    global _mtcnn, _resnet, _face_mesh, _load_attempted, _torch
     if _load_attempted:
         return
             _short_error(exc),
         )
     logger.info("Coherence model load attempt complete")
             logger.warning("Coherence image scoring failed: %s", exc)
             return 0.35
+    def run_video(self, frames: list[np.ndarray]) -> EngineResult:
         t0 = time.perf_counter()
         self._ensure()
         delta = self._embedding_variance(frames)
         jerk = self._landmark_jerk(frames)
         blink = self._blink_anomaly(frames)
+        score = float(np.clip(delta * 0.45 + jerk * 0.35 + blink * 0.20, 0.0, 1.0))
         return EngineResult(
             engine="coherence",
             explanation=(
                 f"Embedding variance {delta:.2f}, "
                 f"landmark jerk {jerk:.2f}, "
+                f"blink anomaly {blink:.2f}."
             ),
             processing_time_ms=(time.perf_counter() - t0) * 1000,
         )
     def _embedding_variance(self, frames: list[np.ndarray]) -> float:
         if _mtcnn is None or _resnet is None or _torch is None:
             return 0.5

src/services/hf_inference_client.py CHANGED Viewed

@@ -22,7 +22,7 @@ import httpx
 logger = logging.getLogger(__name__)
 _HF_API_BASE = "https://api-inference.huggingface.co"
-_DEFAULT_MODEL = "Wvolf/ViT_Deepfake_Detection"
 class HFInferenceUnavailable(RuntimeError):

 logger = logging.getLogger(__name__)
 _HF_API_BASE = "https://api-inference.huggingface.co"
+_DEFAULT_MODEL = "dima806/deepfake_vs_real_image_detection"
 class HFInferenceUnavailable(RuntimeError):

src/services/runpod_client.py CHANGED Viewed

@@ -45,7 +45,7 @@ class RunPodClient:
     def __init__(self) -> None:
         self._api_key = os.environ.get("RUNPOD_API_KEY", "")
         self._endpoint_id = os.environ.get("RUNPOD_ENDPOINT_ID", "")
-        self._model_id = os.environ.get("RUNPOD_MODEL_ID", "Wvolf/ViT_Deepfake_Detection")
     @property
     def available(self) -> bool:

     def __init__(self) -> None:
         self._api_key = os.environ.get("RUNPOD_API_KEY", "")
         self._endpoint_id = os.environ.get("RUNPOD_ENDPOINT_ID", "")
+        self._model_id = os.environ.get("RUNPOD_MODEL_ID", "dima806/deepfake_vs_real_image_detection")
     @property
     def available(self) -> bool: