Spaces:

akagtag
/

deepdetection

Paused

App Files Files Community

akagtag commited on 22 days ago

Commit

cf54850

1 Parent(s): 39f9e8e

align project with CLAUDE spec and hf space deploy

Browse files

Files changed (28) hide show

.env.example +5 -0
.gitignore +3 -1
CLAUDE.md +633 -1177
app.py +104 -0
modules/__init__.py +16 -0
modules/m1_lipsync.py +35 -0
modules/m2_fingerprint.py +44 -0
modules/m3_fallback.py +21 -0
modules/m3_sstgnn.py +4 -0
modules/m5_explain.py +74 -0
modules/m5_fusion.py +40 -0
packages.txt +3 -0
requirements.txt +7 -5
runpod_handler.py +4 -5
src/api/main.py +2 -1
src/engines/coherence/engine.py +96 -43
src/engines/fingerprint/engine.py +106 -33
src/engines/sstgnn/engine.py +72 -26
src/explainability/explainer.py +55 -208
src/fusion/fuser.py +104 -109
src/training/config.py +12 -9
test_assets/README.md +5 -0
tests/training/test_datasets.py +3 -3
tests/training/test_metrics.py +4 -4
utils/__init__.py +5 -0
utils/graph.py +45 -0
utils/video.py +13 -0
weights/README.md +5 -0

.env.example ADDED Viewed

	@@ -0,0 +1,5 @@

+NVIDIA_API_KEY=nvapi-your-key
+HF_TOKEN=hf_your_token
+INFERENCE_BACKEND=local
+MODEL_CACHE_DIR=/tmp/models

.gitignore CHANGED Viewed

@@ -12,6 +12,9 @@ data/
 *.zip
 *.tar
 *.tar.gz
 # ── Cache dirs (never commit these) ──────────────────────────────────────────
 .deps-local/
@@ -42,7 +45,6 @@ training/logs/
 venv/
 .venv/
 env/
-.env.example
 # ── IDE ───────────────────────────────────────────────────────────────────────
 .vscode/

 *.zip
 *.tar
 *.tar.gz
+test_assets/*.mp4
+test_assets/*.mov
+test_assets/*.avi
 # ── Cache dirs (never commit these) ──────────────────────────────────────────
 .deps-local/
 venv/
 .venv/
 env/
 # ── IDE ───────────────────────────────────────────────────────────────────────
 .vscode/

CLAUDE.md CHANGED Viewed

@@ -1,1323 +1,779 @@
-# CLAUDE.md — GenAI-DeepDetect Agent Instructions
-> Read this file before touching any code. It is the single source of truth for
-> how this repo is structured, what conventions to follow, and what the hard
-> constraints are.
-# CLAUDE.md — GenAI-DeepDetect
-Full implementation guide for AI-assisted development on this project. Read this
-file before touching any code.
--# CLAUDE.md — GenAI-DeepDetect
-Complete implementation guide. Read this before writing any code. All models are
-**100% pre-trained** — no training required, no GPU needed locally.
 ---
-## MCP Tools — Always Use These First
-Before writing any code or looking up any API, resolve docs through MCP:
-```
-context7: resolve-library-id + query-docs
-  → use for: transformers, torch, mediapipe, fastapi, torch-geometric,
-    google-generativeai, facenet-pytorch, opencv, next.js, runpod
-huggingface: model_search + model_details + hf_doc_search
-  → use for: finding model cards, checking input formats, confirming
-    pipeline task names, verifying checkpoint sizes before using
-```
-**Rule**: Never guess an API signature. Always call `context7.query-docs` first.
-Never use a HF model without calling `huggingface.model_details` to confirm it
-exists, check its license, and verify its input format.
----
-## Project Skill And Memory Policy
-For work in this repository, always prefer the installed Claude Code skill pack
-when a relevant skill applies instead of ad hoc workflows.
-- **Always-on user preference**: use Awesome Claude Code workflows with
-  Superpowers + Claude Mem by default, and execute implementation steps
-  automatically unless the user explicitly asks for planning-only mode.
-- At task start, check Superpowers process skills first (for example:
-  `using-superpowers`, `brainstorming`, `systematic-debugging`,
-  `verification-before-completion`) and apply the relevant ones before coding.
-- For memory-aware tasks, use Claude Mem (`mem-search`) automatically to recall
-  prior decisions, fixes, and session history when that context can reduce risk
-  or rework.
-- If there is a conflict between this default behavior and a direct user
-  instruction in the current chat, follow the direct user instruction.
-- Use `context7-mcp` for any library, framework, SDK, or API question, and
-  before changing code that depends on external packages or hosted services.
-- Use `mem-search` / claude-mem whenever the user asks about previous sessions,
-  prior fixes, earlier decisions, or "how we solved this before".
-- When using claude-mem, scope searches to project name `genai-deepdetect`
-  unless the user explicitly asks for a broader search.
-- Keep following the repo-specific MCP rules below even when a general-purpose
-  skill also applies.
-Recommended companion skills for this project:
-- `systematic-debugging` for bugs, failing tests, or unexpected runtime
-  behavior
-- `verification-before-completion` before claiming a fix is done
-- `security-review` for secrets, external APIs, uploads, and auth-sensitive
-  changes
 ---
-## Project Goal
-Multimodal deepfake and AI-generated content detector.
-- Input: image (JPEG/PNG/WEBP) or video (MP4/MOV/AVI, max 100MB)
-- Output: `DetectionResponse` — verdict, confidence, generator attribution,
-  natural-language explanation, per-engine breakdown
-All inference runs on pre-trained HuggingFace checkpoints. No training scripts
-need to run for the system to work.
 ---
-## Architecture
 ```
-Request (image/video)
-        │
-        ▼
-  FastAPI  src/api/main.py
-        │
-        ├── FingerprintEngine   (image artifacts, generator attribution)
-        ├── CoherenceEngine     (lip-sync, biological coherence)
-        └── SSTGNNEngine        (landmark spatio-temporal graph)
-                │
-                ▼
-          Fuser  src/fusion/fuser.py
-                │
-                ▼
-        Explainer  src/explainability/explainer.py   ← Gemini API
-                │
-                ▼
-        DetectionResponse  src/types.py
 ```
 ---
-## All Pre-Trained Models
-Every model downloads via `transformers.pipeline()` or `from_pretrained()`. Zero
-training. Zero fine-tuning.
-| Engine      | Model               | HF ID                                      | Size   | Task                   |
-| ----------- | ------------------- | ------------------------------------------ | ------ | ---------------------- |
-| Fingerprint | SDXL Detector       | `Organika/sdxl-detector`                   | ~330MB | binary fake/real       |
-| Fingerprint | CLIP ViT-L/14       | `openai/clip-vit-large-patch14`            | ~3.5GB | generator attribution  |
-| Fingerprint | AI Image Detector   | `haywoodsloan/ai-image-detector-deploy`    | ~90MB  | ensemble backup        |
-| SSTGNN      | DeepFake Detector   | `dima806/deepfake_vs_real_image_detection` | ~100MB | ResNet50 per-frame     |
-| SSTGNN      | Deep Fake Detector  | `prithivMLmods/Deep-Fake-Detector-Model`   | ~80MB  | EfficientNet-B4 backup |
-| Coherence   | MediaPipe Face Mesh | bundled in `mediapipe` package             | ~10MB  | landmark extraction    |
-| Coherence   | FaceNet VGGFace2    | `facenet-pytorch` (auto-downloads)         | ~100MB | temporal embeddings    |
-| Coherence   | SyncNet             | `Junhua-Zhu/SyncNet`                       | ~50MB  | lip-sync offset        |
-CLIP is the largest at 3.5GB — preload at startup, never reload. Everything else
-fits in HF Spaces 16GB RAM free tier.
 ---
-## Environment Variables
-```bash
-# Required
-GEMINI_API_KEY=...                  # Google AI Studio — free tier works
-HF_TOKEN=hf_...                     # HuggingFace read token (free)
-# Hosting
-RUNPOD_API_KEY=...                  # RunPod serverless (heavy video)
-RUNPOD_ENDPOINT_ID=...              # your deployed endpoint ID
-# Paths
-MODEL_CACHE_DIR=/data/models        # HF Spaces: /data/models (persists)
-                                    # local dev: /tmp/models
-# Optional
-MAX_VIDEO_FRAMES=300
-MAX_VIDEO_SIZE_MB=100
-INFERENCE_BACKEND=local             # "local" | "runpod"
-TOKENIZERS_PARALLELISM=false
-```
-Set all secrets in:
-- HF Spaces → Settings → Repository secrets
-- RunPod → Secrets tab
-- Vercel → Environment Variables
----
-## Gemini API — Explainability Engine
-**Primary model**: `gemini-2.5-pro-preview-03-25` **Fallback model**:
-`gemini-1.5-pro-002`
-Both available on Google AI Studio free tier (15 req/min, 1M tokens/day). Always
-query `context7.query-docs google-generativeai GenerativeModel` before modifying
-this file.
-### `src/explainability/explainer.py`
 ```python
-import os
-import logging
-import google.generativeai as genai
-from src.types import EngineResult
-logger = logging.getLogger(__name__)
-genai.configure(api_key=os.environ["GEMINI_API_KEY"])
-SYSTEM_INSTRUCTION = (
-    "You are a deepfake forensics analyst writing reports for security professionals. "
-    "Given detection engine outputs, write exactly 2-3 sentences in plain English "
-    "explaining why the content is real or fake. "
-    "Be specific — name the strongest signals. "
-    "Use direct declarative sentences. No hedging. No 'I think'. "
-    "Output only the explanation text, nothing else."
-)
-_model = None
-def _get_model() -> genai.GenerativeModel:
-    global _model
-    if _model is None:
-        for name in ("gemini-2.5-pro-preview-03-25", "gemini-1.5-pro-002"):
-            try:
-                _model = genai.GenerativeModel(
-                    model_name=name,
-                    system_instruction=SYSTEM_INSTRUCTION,
-                )
-                logger.info(f"Gemini model loaded: {name}")
-                break
-            except Exception as e:
-                logger.warning(f"Gemini {name} unavailable: {e}")
-    return _model
-def explain(
-    verdict: str,
-    confidence: float,
-    engine_results: list[EngineResult],
-    generator: str,
-) -> str:
-    breakdown = "\n".join(
-        f"- {r.engine}: {r.verdict} ({r.confidence:.0%}) — {r.explanation}"
-        for r in engine_results
-    )
-    prompt = (
-        f"Verdict: {verdict} ({confidence:.0%} confidence)\n"
-        f"Attributed generator: {generator}\n"
-        f"Engine breakdown:\n{breakdown}\n\n"
-        "Write the forensics explanation."
-    )
-    try:
-        model = _get_model()
-        if model is None:
-            raise RuntimeError("No Gemini model available")
-        response = model.generate_content(prompt)
-        return response.text.strip()
-    except Exception as e:
-        logger.error(f"Gemini explain failed: {e}")
-        top = engine_results[0] if engine_results else None
-        return (
-            f"Content classified as {verdict} with {confidence:.0%} confidence. "
-            f"{'Primary signal from ' + top.engine + ' engine.' if top else ''}"
-        )
-```
----
-## Engine Implementations
-### FingerprintEngine — `src/engines/fingerprint/engine.py`
-Query context7 for `transformers pipeline image-classification` and
-`huggingface model_details Organika/sdxl-detector` before modifying.
-```python
-import os, logging, threading
-import numpy as np
-from PIL import Image
-from transformers import pipeline, CLIPModel, CLIPProcessor
-import torch
-from src.types import EngineResult
-logger = logging.getLogger(__name__)
-CACHE = os.environ.get("MODEL_CACHE_DIR", "/tmp/models")
-GENERATOR_PROMPTS = {
-    "real":             "a real photograph taken by a camera with natural lighting",
-    "unknown_gan":      "a GAN-generated image with checkerboard artifacts and blurry edges",
-    "stable_diffusion": "a Stable Diffusion image with painterly soft textures",
-    "midjourney":       "a Midjourney image with cinematic dramatic lighting and hyperdetail",
-    "dall_e":           "a DALL-E image with clean illustration-style and smooth gradients",
-    "flux":             "a FLUX model image with photorealistic precision and sharp detail",
-    "firefly":          "an Adobe Firefly image with commercial stock-photo aesthetics",
-    "imagen":           "a Google Imagen image with precise photorealistic rendering",
-}
-_lock = threading.Lock()
-_detector = _clip_model = _clip_processor = _backup = None
-def _load():
-    global _detector, _clip_model, _clip_processor, _backup
-    if _detector is not None:
-        return
-    logger.info("Loading fingerprint models...")
-    _detector = pipeline("image-classification",
-                         model="Organika/sdxl-detector", cache_dir=CACHE)
-    _clip_model = CLIPModel.from_pretrained(
-        "openai/clip-vit-large-patch14", cache_dir=CACHE)
-    _clip_processor = CLIPProcessor.from_pretrained(
-        "openai/clip-vit-large-patch14", cache_dir=CACHE)
-    _clip_model.eval()
-    try:
-        _backup = pipeline("image-classification",
-                           model="haywoodsloan/ai-image-detector-deploy",
-                           cache_dir=CACHE)
-    except Exception:
-        logger.warning("Backup fingerprint detector unavailable")
-    logger.info("Fingerprint models ready")
-class FingerprintEngine:
-    def _ensure(self):
-        with _lock:
-            _load()
-    def run(self, image: Image.Image) -> EngineResult:
-        self._ensure()
-        if image.mode != "RGB":
-            image = image.convert("RGB")
-        # Binary fake score
-        FAKE_LABELS = {"artificial", "fake", "ai-generated", "generated"}
-        try:
-            preds = _detector(image)
-            fake_score = max(
-                (p["score"] for p in preds if p["label"].lower() in FAKE_LABELS),
-                default=0.5,
-            )
-        except Exception as e:
-            logger.warning(f"Primary detector error: {e}")
-            fake_score = 0.5
-        # Ensemble backup
-        if _backup is not None:
-            try:
-                bp = _backup(image)
-                bk = max((p["score"] for p in bp
-                          if p["label"].lower() in FAKE_LABELS), default=0.5)
-                fake_score = fake_score * 0.6 + bk * 0.4
-            except Exception:
-                pass
-        # CLIP zero-shot generator attribution
-        generator = "real"
-        try:
-            texts = list(GENERATOR_PROMPTS.values())
-            inputs = _clip_processor(
-                text=texts, images=image,
-                return_tensors="pt", padding=True, truncation=True,
-            )
-            with torch.no_grad():
-                logits = _clip_model(**inputs).logits_per_image[0]
-            probs = logits.softmax(dim=0).numpy()
-            generator = list(GENERATOR_PROMPTS.keys())[int(np.argmax(probs))]
-        except Exception as e:
-            logger.warning(f"CLIP attribution error: {e}")
-        if fake_score > 0.65 and generator == "real":
-            generator = "unknown_gan"
-        return EngineResult(
-            engine="fingerprint",
-            verdict="FAKE" if fake_score > 0.5 else "REAL",
-            confidence=float(fake_score),
-            attributed_generator=generator,
-            explanation=f"Binary score {fake_score:.2f}; attributed to {generator}.",
-        )
-    def run_video(self, frames: list) -> EngineResult:
-        if not frames:
-            return EngineResult(engine="fingerprint", verdict="UNKNOWN",
-                                confidence=0.5, explanation="No frames.")
-        keyframes = frames[::8] or [frames[0]]
-        results = [self.run(Image.fromarray(f)) for f in keyframes]
-        avg = float(np.mean([r.confidence for r in results]))
-        gens = [r.attributed_generator for r in results]
-        top_gen = max(set(gens), key=gens.count)
-        return EngineResult(
-            engine="fingerprint",
-            verdict="FAKE" if avg > 0.5 else "REAL",
-            confidence=avg,
-            attributed_generator=top_gen,
-            explanation=f"Keyframe average {avg:.2f} over {len(keyframes)} frames.",
         )
 ```
 ---
-### CoherenceEngine — `src/engines/coherence/engine.py`
-Query `context7.query-docs mediapipe face_mesh` and
-`context7.query-docs facenet-pytorch InceptionResnetV1` before modifying.
 ```python
-import logging, threading, cv2
 import numpy as np
 from PIL import Image
-from facenet_pytorch import MTCNN, InceptionResnetV1
-import mediapipe as mp
-from src.types import EngineResult
-logger = logging.getLogger(__name__)
-_lock = threading.Lock()
-_mtcnn = _resnet = _face_mesh = None
-def _load():
-    global _mtcnn, _resnet, _face_mesh
-    if _mtcnn is not None:
-        return
-    logger.info("Loading coherence models...")
-    _mtcnn   = MTCNN(keep_all=False, device="cpu")
-    _resnet  = InceptionResnetV1(pretrained="vggface2").eval()
-    _face_mesh = mp.solutions.face_mesh.FaceMesh(
-        static_image_mode=False, max_num_faces=1,
-        refine_landmarks=True, min_detection_confidence=0.5,
-    )
-    logger.info("Coherence models ready")
-class CoherenceEngine:
-    def _ensure(self):
-        with _lock:
-            _load()
-    def run(self, image: Image.Image) -> EngineResult:
-        self._ensure()
-        frame = np.array(image.convert("RGB"))
-        score = self._image_score(frame)
-        return EngineResult(
-            engine="coherence",
-            verdict="FAKE" if score > 0.5 else "REAL",
-            confidence=float(score),
-            explanation=f"Geometric coherence anomaly {score:.2f} (image mode).",
         )
-    def _image_score(self, frame: np.ndarray) -> float:
-        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) if frame.shape[2] == 3 else frame
-        res = _face_mesh.process(rgb)
-        if not res.multi_face_landmarks:
-            return 0.35  # no face detected
-        lms = res.multi_face_landmarks[0].landmark
-        h, w = frame.shape[:2]
-        def pt(i):
-            return np.array([lms[i].x * w, lms[i].y * h])
-        # Eye width asymmetry — deepfakes often mismatched
-        lew = np.linalg.norm(pt(33)  - pt(133))
-        rew = np.linalg.norm(pt(362) - pt(263))
-        eye_ratio = min(lew, rew) / (max(lew, rew) + 1e-9)
-        eye_score = max(0.0, (0.85 - eye_ratio) / 0.3)
-        # Ear symmetry from nose tip
-        nose = pt(1)
-        lr = min(np.linalg.norm(nose - pt(234)), np.linalg.norm(nose - pt(454)))
-        rr = max(np.linalg.norm(nose - pt(234)), np.linalg.norm(nose - pt(454)))
-        ear_score = max(0.0, (0.90 - lr / (rr + 1e-9)) / 0.2)
-        return float(np.clip(eye_score * 0.5 + ear_score * 0.5, 0.0, 1.0))
-    def run_video(self, frames: list[np.ndarray]) -> EngineResult:
-        self._ensure()
-        if len(frames) < 4:
-            r = self.run(Image.fromarray(frames[0]))
-            r.explanation = "Too few frames for temporal analysis."
-            return r
-        delta  = self._embedding_variance(frames)
-        jerk   = self._landmark_jerk(frames)
-        blink  = self._blink_anomaly(frames)
-        score  = float(np.clip(delta * 0.45 + jerk * 0.35 + blink * 0.20, 0.0, 1.0))
-        return EngineResult(
-            engine="coherence",
-            verdict="FAKE" if score > 0.5 else "REAL",
-            confidence=score,
-            explanation=(
-                f"Embedding variance {delta:.2f}, "
-                f"landmark jerk {jerk:.2f}, "
-                f"blink anomaly {blink:.2f}."
-            ),
         )
-    def _embedding_variance(self, frames: list[np.ndarray]) -> float:
-        import torch
-        embeddings = []
-        for frame in frames[::4]:
-            try:
-                face = _mtcnn(Image.fromarray(frame))
-                if face is not None:
-                    with torch.no_grad():
-                        e = _resnet(face.unsqueeze(0)).numpy()[0]
-                    embeddings.append(e)
-            except Exception:
-                continue
-        if len(embeddings) < 2:
-            return 0.5
-        deltas = [np.linalg.norm(embeddings[i+1] - embeddings[i])
-                  for i in range(len(embeddings)-1)]
-        return float(np.clip(np.var(deltas) * 8, 0.0, 1.0))
-    def _landmark_jerk(self, frames: list[np.ndarray]) -> float:
-        positions = []
-        for frame in frames[::2]:
-            rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
-            res = _face_mesh.process(rgb)
-            if res.multi_face_landmarks:
-                lm = res.multi_face_landmarks[0].landmark
-                positions.append([lm[1].x, lm[1].y])
-        if len(positions) < 4:
-            return 0.3
-        pos   = np.array(positions)
-        jerk  = np.diff(pos, n=3, axis=0)
-        return float(np.clip((np.mean(np.linalg.norm(jerk, axis=1)) - 0.002) / 0.008,
-                             0.0, 1.0))
-    def _blink_anomaly(self, frames: list[np.ndarray]) -> float:
-        LEFT_EYE  = [33, 160, 158, 133, 153, 144]
-        RIGHT_EYE = [362, 385, 387, 263, 373, 380]
-        def ear(lms, idx, h, w):
-            pts = [np.array([lms[i].x * w, lms[i].y * h]) for i in idx]
-            a = np.linalg.norm(pts[1] - pts[5])
-            b = np.linalg.norm(pts[2] - pts[4])
-            c = np.linalg.norm(pts[0] - pts[3])
-            return (a + b) / (2.0 * c + 1e-9)
-        ears = []
         for frame in frames:
-            rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
-            res = _face_mesh.process(rgb)
-            if res.multi_face_landmarks:
-                lm = res.multi_face_landmarks[0].landmark
-                h, w = frame.shape[:2]
-                ears.append((ear(lm, LEFT_EYE, h, w) + ear(lm, RIGHT_EYE, h, w)) / 2)
-        if len(ears) < 10:
-            return 0.3
-        arr    = np.array(ears)
-        blinks = int(np.sum(np.diff((arr < 0.21).astype(int)) > 0))
-        bpm    = blinks / (len(ears) / 25) * 60
-        if 8 <= bpm <= 25:
-            return 0.15
-        if bpm < 3 or bpm > 35:
-            return 0.80
-        return 0.45
 ```
 ---
-### SSTGNNEngine — `src/engines/sstgnn/engine.py`
-Query `context7.query-docs torch-geometric GCNConv` and
-`huggingface model_details dima806/deepfake_vs_real_image_detection` before
-modifying.
 ```python
-import logging, os, threading
-import numpy as np
-import cv2
-from PIL import Image
-from transformers import pipeline
-import mediapipe as mp
-from scipy.spatial import Delaunay
-from src.types import EngineResult
-logger = logging.getLogger(__name__)
-CACHE = os.environ.get("MODEL_CACHE_DIR", "/tmp/models")
-_lock = threading.Lock()
-_det1 = _det2 = _mesh = None
-def _load():
-    global _det1, _det2, _mesh
-    if _det1 is not None:
-        return
-    logger.info("Loading SSTGNN models...")
-    _det1 = pipeline("image-classification",
-                     model="dima806/deepfake_vs_real_image_detection",
-                     cache_dir=CACHE)
-    try:
-        _det2 = pipeline("image-classification",
-                         model="prithivMLmods/Deep-Fake-Detector-Model",
-                         cache_dir=CACHE)
-    except Exception:
-        logger.warning("SSTGNN backup detector unavailable")
-    _mesh = mp.solutions.face_mesh.FaceMesh(
-        static_image_mode=True, max_num_faces=1, refine_landmarks=True)
-    logger.info("SSTGNN models ready")
-def _fake_prob(preds: list[dict]) -> float:
-    fake_kw = {"fake", "deepfake", "artificial", "generated", "ai"}
-    return max(
-        (p["score"] for p in preds
-         if any(k in p["label"].lower() for k in fake_kw)),
-        default=0.5,
-    )
-class SSTGNNEngine:
-    def _ensure(self):
-        with _lock:
-            _load()
-    def run(self, image: Image.Image) -> EngineResult:
-        self._ensure()
-        if image.mode != "RGB":
-            image = image.convert("RGB")
-        scores = []
-        try:
-            scores.append(_fake_prob(_det1(image)) * 0.6)
-        except Exception as e:
-            logger.warning(f"SSTGNN det1 error: {e}")
-        if _det2:
-            try:
-                scores.append(_fake_prob(_det2(image)) * 0.4)
-            except Exception as e:
-                logger.warning(f"SSTGNN det2 error: {e}")
-        if not scores:
-            return EngineResult(engine="sstgnn", verdict="UNKNOWN",
-                                confidence=0.5, explanation="All detectors failed.")
-        cnn = sum(scores) / (0.6 if len(scores) == 1 else 1.0)
-        graph = self._geometry_score(np.array(image))
-        final = float(np.clip(cnn * 0.7 + graph * 0.3, 0.0, 1.0))
-        return EngineResult(
-            engine="sstgnn",
-            verdict="FAKE" if final > 0.5 else "REAL",
-            confidence=final,
-            explanation=f"CNN {cnn:.2f}, geometric graph anomaly {graph:.2f}.",
-        )
-    def _geometry_score(self, frame: np.ndarray) -> float:
-        try:
-            rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
-            res = _mesh.process(rgb)
-            if not res.multi_face_landmarks:
-                return 0.3
-            h, w = frame.shape[:2]
-            lms = res.multi_face_landmarks[0].landmark
-            idxs = list(range(0, 468, 7))[:68]
-            pts  = np.array([[lms[i].x * w, lms[i].y * h] for i in idxs])
-            tri  = Delaunay(pts)
-            areas = []
-            for s in tri.simplices:
-                a, b, c = pts[s]
-                areas.append(abs(np.cross(b - a, c - a)) / 2)
-            areas = np.array(areas)
-            cv_score = float(np.std(areas) / (np.mean(areas) + 1e-9))
-            return float(np.clip((cv_score - 0.8) / 1.5, 0.0, 1.0))
-        except Exception as e:
-            logger.warning(f"Geometry score error: {e}")
-            return 0.3
-    def run_video(self, frames: list[np.ndarray]) -> EngineResult:
-        self._ensure()
-        if not frames:
-            return EngineResult(engine="sstgnn", verdict="UNKNOWN",
-                                confidence=0.5, explanation="No frames.")
-        sample = frames[::6] or [frames[0]]
-        results = [self.run(Image.fromarray(f)) for f in sample]
-        avg = float(np.mean([r.confidence for r in results]))
-        return EngineResult(
-            engine="sstgnn",
-            verdict="FAKE" if avg > 0.5 else "REAL",
-            confidence=avg,
-            explanation=f"Frame-sampled SSTGNN average {avg:.2f} over {len(sample)} frames.",
         )
-```
----
-## Fusion — `src/fusion/fuser.py`
-```python
-import numpy as np
-from src.types import EngineResult
-ENGINE_WEIGHTS = {
-    "fingerprint": 0.45,
-    "coherence":   0.35,
-    "sstgnn":      0.20,
-}
-ENGINE_WEIGHTS_VIDEO = {
-    "fingerprint": 0.30,
-    "coherence":   0.50,
-    "sstgnn":      0.20,
-}
-ATTRIBUTION_PRIORITY = {"fingerprint": 1, "sstgnn": 2, "coherence": 3}
-def fuse(
-    results: list[EngineResult],
-    is_video: bool = False,
-) -> tuple[str, float, str]:
-    """Returns (verdict, confidence, attributed_generator)."""
-    weights = ENGINE_WEIGHTS_VIDEO if is_video else ENGINE_WEIGHTS
-    active  = [r for r in results if r.verdict != "UNKNOWN"]
-    if not active:
-        return "UNKNOWN", 0.5, "unknown_gan"
-    wf = sum(r.confidence * weights.get(r.engine, 0.1)
-             for r in active if r.verdict == "FAKE")
-    wr = sum((1 - r.confidence) * weights.get(r.engine, 0.1)
-             for r in active if r.verdict == "REAL")
-    fake_prob = float(np.clip(wf / (wf + wr + 1e-9), 0.0, 1.0))
-    verdict   = "FAKE" if fake_prob > 0.5 else "REAL"
-    generator = "real"
-    if verdict == "FAKE":
-        for r in sorted(active, key=lambda r: ATTRIBUTION_PRIORITY.get(r.engine, 9)):
-            if r.attributed_generator and r.attributed_generator != "real":
-                generator = r.attributed_generator
-                break
-        if generator == "real":
-            generator = "unknown_gan"
-    return verdict, fake_prob, generator
 ```
----
-## API — `src/api/main.py`
 ```python
-import asyncio, io, logging, os, time
-from pathlib import Path
-import cv2, numpy as np
-from fastapi import FastAPI, File, HTTPException, UploadFile
-from fastapi.middleware.cors import CORSMiddleware
-from PIL import Image
-from src.engines.fingerprint.engine import FingerprintEngine
-from src.engines.coherence.engine    import CoherenceEngine
-from src.engines.sstgnn.engine       import SSTGNNEngine
-from src.explainability.explainer    import explain
-from src.fusion.fuser                import fuse
-from src.services.inference_router   import route_inference
-from src.types                       import DetectionResponse
-logger = logging.getLogger(__name__)
-app = FastAPI(title="GenAI-DeepDetect", version="1.0.0")
-app.add_middleware(
-    CORSMiddleware,
-    allow_origins=["*"], allow_methods=["*"], allow_headers=["*"],
-)
-_fp = FingerprintEngine()
-_co = CoherenceEngine()
-_st = SSTGNNEngine()
-MAX_MB     = int(os.environ.get("MAX_VIDEO_SIZE_MB", 100))
-MAX_FRAMES = int(os.environ.get("MAX_VIDEO_FRAMES",  300))
-IMAGE_TYPES = {"image/jpeg", "image/png", "image/webp", "image/bmp"}
-VIDEO_TYPES = {"video/mp4", "video/quicktime", "video/x-msvideo", "video/webm"}
-def _extract_frames(path: str) -> list[np.ndarray]:
-    cap = cv2.VideoCapture(path)
     total = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
-    step  = max(1, total // MAX_FRAMES)
-    frames, i = [], 0
-    while True:
         ret, frame = cap.read()
         if not ret:
             break
-        if i % step == 0:
-            frames.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
-        i += 1
     cap.release()
-    return frames[:MAX_FRAMES]
-@app.on_event("startup")
-async def preload():
-    logger.info("Preloading models...")
-    await asyncio.gather(
-        asyncio.to_thread(_fp._ensure),
-        asyncio.to_thread(_co._ensure),
-        asyncio.to_thread(_st._ensure),
-    )
-    logger.info("All models preloaded")
-@app.get("/health")
-async def health():
-    return {"status": "ok"}
-@app.post("/detect/image", response_model=DetectionResponse)
-async def detect_image(file: UploadFile = File(...)):
-    t0 = time.monotonic()
-    if file.content_type not in IMAGE_TYPES:
-        raise HTTPException(400, f"Unsupported type: {file.content_type}")
-    data = await file.read()
-    if len(data) > MAX_MB * 1024 * 1024:
-        raise HTTPException(413, "File too large")
-    image = Image.open(io.BytesIO(data)).convert("RGB")
-    fp, co, st = await asyncio.gather(
-        asyncio.to_thread(_fp.run, image),
-        asyncio.to_thread(_co.run, image),
-        asyncio.to_thread(_st.run, image),
-    )
-    ms = (time.monotonic() - t0) * 1000
-    for r in [fp, co, st]:
-        r.processing_time_ms = ms
-    verdict, conf, gen = fuse([fp, co, st], is_video=False)
-    expl = await asyncio.to_thread(explain, verdict, conf, [fp, co, st], gen)
-    return DetectionResponse(
-        verdict=verdict, confidence=conf, attributed_generator=gen,
-        explanation=expl, processing_time_ms=ms,
-        engine_breakdown=[fp, co, st],
-    )
-@app.post("/detect/video", response_model=DetectionResponse)
-async def detect_video(file: UploadFile = File(...)):
-    t0 = time.monotonic()
-    if file.content_type not in VIDEO_TYPES:
-        raise HTTPException(400, f"Unsupported type: {file.content_type}")
-    data = await file.read()
-    if len(data) > MAX_MB * 1024 * 1024:
-        raise HTTPException(413, "File too large")
-    # Route heavy videos to RunPod
-    if len(data) > 20 * 1024 * 1024:
-        return await route_inference(data, "video")
-    tmp = Path(f"/tmp/vid_{int(time.time()*1000)}.mp4")
-    tmp.write_bytes(data)
-    try:
-        frames = await asyncio.to_thread(_extract_frames, str(tmp))
-    finally:
-        tmp.unlink(missing_ok=True)
-    if not frames:
-        raise HTTPException(422, "Could not extract frames")
-    fp, co, st = await asyncio.gather(
-        asyncio.to_thread(_fp.run_video, frames),
-        asyncio.to_thread(_co.run_video, frames),
-        asyncio.to_thread(_st.run_video, frames),
-    )
-    ms = (time.monotonic() - t0) * 1000
-    for r in [fp, co, st]:
-        r.processing_time_ms = ms
-    verdict, conf, gen = fuse([fp, co, st], is_video=True)
-    expl = await asyncio.to_thread(explain, verdict, conf, [fp, co, st], gen)
-    return DetectionResponse(
-        verdict=verdict, confidence=conf, attributed_generator=gen,
-        explanation=expl, processing_time_ms=ms,
-        engine_breakdown=[fp, co, st],
-    )
 ```
----
-## Types — `src/types.py`
-```python
-from __future__ import annotations
-from typing import Optional
-from pydantic import BaseModel
-GENERATOR_LABELS = {
-    0: "real",
-    1: "unknown_gan",
-    2: "stable_diffusion",
-    3: "midjourney",
-    4: "dall_e",
-    5: "flux",
-    6: "firefly",
-    7: "imagen",
-}
-class EngineResult(BaseModel):
-    engine: str
-    verdict: str                            # FAKE | REAL | UNKNOWN
-    confidence: float                       # 0–1
-    attributed_generator: Optional[str] = None
-    explanation: str = ""
-    processing_time_ms: float = 0.0
-class DetectionResponse(BaseModel):
-    verdict: str
-    confidence: float
-    attributed_generator: str
-    explanation: str
-    processing_time_ms: float
-    engine_breakdown: list[EngineResult]
-```
----
-## Inference Router — `src/services/inference_router.py`
 ```python
-import base64, logging, os
-import httpx
-from src.types import DetectionResponse
-logger = logging.getLogger(__name__)
-RUNPOD_KEY = os.environ.get("RUNPOD_API_KEY", "")
-RUNPOD_EID = os.environ.get("RUNPOD_ENDPOINT_ID", "")
-async def route_inference(data: bytes, media_type: str) -> DetectionResponse:
-    if not RUNPOD_KEY or not RUNPOD_EID:
-        raise RuntimeError(
-            "RunPod not configured. Set RUNPOD_API_KEY and RUNPOD_ENDPOINT_ID."
         )
-    url     = f"https://api.runpod.ai/v2/{RUNPOD_EID}/runsync"
-    payload = {"input": {"data": base64.b64encode(data).decode(),
-                         "media_type": media_type}}
-    async with httpx.AsyncClient(timeout=120) as client:
-        resp = await client.post(url, json=payload,
-                                 headers={"Authorization": f"Bearer {RUNPOD_KEY}"})
-        resp.raise_for_status()
-        return DetectionResponse(**resp.json()["output"])
 ```
----
-## RunPod Handler — `runpod_handler.py` (project root)
 ```python
-import base64, io, os, tempfile
-import runpod, cv2, numpy as np
 from PIL import Image
-os.environ.setdefault("MODEL_CACHE_DIR", "/tmp/models")
-from src.engines.fingerprint.engine import FingerprintEngine
-from src.engines.coherence.engine    import CoherenceEngine
-from src.engines.sstgnn.engine       import SSTGNNEngine
-from src.explainability.explainer    import explain
-from src.fusion.fuser                import fuse
-_fp = FingerprintEngine()
-_co = CoherenceEngine()
-_st = SSTGNNEngine()
-def handler(job: dict) -> dict:
-    inp        = job["input"]
-    raw        = base64.b64decode(inp["data"])
-    media_type = inp.get("media_type", "image")
-    if media_type == "image":
-        image = Image.open(io.BytesIO(raw)).convert("RGB")
-        fp = _fp.run(image)
-        co = _co.run(image)
-        st = _st.run(image)
-        verdict, conf, gen = fuse([fp, co, st], is_video=False)
-    else:
-        with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as f:
-            f.write(raw)
-            tmp = f.name
-        try:
-            cap = cv2.VideoCapture(tmp)
-            frames, i = [], 0
-            while True:
-                ret, frame = cap.read()
-                if not ret:
-                    break
-                if i % 4 == 0:
-                    frames.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
-                i += 1
-            cap.release()
-        finally:
-            os.unlink(tmp)
-        fp = _fp.run_video(frames)
-        co = _co.run_video(frames)
-        st = _st.run_video(frames)
-        verdict, conf, gen = fuse([fp, co, st], is_video=True)
-    expl = explain(verdict, conf, [fp, co, st], gen)
-    return {
-        "verdict": verdict,
-        "confidence": conf,
-        "attributed_generator": gen,
-        "explanation": expl,
-        "processing_time_ms": 0.0,
-        "engine_breakdown": [r.model_dump() for r in [fp, co, st]],
-    }
-runpod.serverless.start({"handler": handler})
 ```
 ---
-## Hosting
-### Option A — HuggingFace Spaces (Free, CPU, primary API host)
-**`spaces/app.py`**:
 ```python
-import os
-os.environ.setdefault("MODEL_CACHE_DIR", "/data/models")
-os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
-import uvicorn
-from src.api.main import app
-if __name__ == "__main__":
-    uvicorn.run(app, host="0.0.0.0", port=7860, workers=1)
 ```
-**Root `README.md`** front-matter (Hugging Face reads this file):
-```yaml
----
-title: GenAI DeepDetect
-emoji: "🔍"
-colorFrom: gray
-colorTo: indigo
-sdk: docker
-app_port: 7860
-pinned: false
----
-```
-**`Dockerfile`** (replace existing):
-```dockerfile
-FROM python:3.11-slim
-RUN apt-get update && apt-get install -y \
-    ffmpeg libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev \
-    && rm -rf /var/lib/apt/lists/*
-WORKDIR /app
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-COPY . .
-ENV MODEL_CACHE_DIR=/data/models
-ENV TOKENIZERS_PARALLELISM=false
-ENV PYTHONUNBUFFERED=1
-EXPOSE 7860
-CMD ["python", "spaces/app.py"]
-```
-**Secrets to set in HF Spaces** (Settings → Repository secrets):
 ```
-GEMINI_API_KEY
-HF_TOKEN
-RUNPOD_API_KEY
-RUNPOD_ENDPOINT_ID
-```
-**Free tier**: 2 vCPU, 16GB RAM, persistent `/data` volume. Models cache to
-`/data/models` and survive container restarts. Cold start first request: ~90s.
-Warm: <5s. GPU upgrade: T4 at $0.05/hr if needed.
 ---
-### Option B — RunPod Serverless (GPU, heavy video, low cost)
-1. RunPod → Serverless → New Endpoint
-2. Select template: `runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04`
-3. Set handler file: `runpod_handler.py`
-4. Min replicas: 0, Max: 3
-5. GPU: RTX 3090 or A40 (cheapest that works)
-6. Set env vars: `GEMINI_API_KEY`, `HF_TOKEN`, `MODEL_CACHE_DIR=/tmp/models`
-**Cost**: ~$0.0002/request on H100. Billed per second. Min workers = 0 means you
-pay nothing when idle — cold start is ~15s.
-**When it triggers**: `inference_router.py` automatically sends videos >20MB to
-RunPod. Images always run on HF Spaces.
----
-## Frontend — `frontend/lib/api.ts`
-```typescript
-const BASE_URL =
-	process.env.NEXT_PUBLIC_API_URL ??
-	'https://YOUR-USERNAME-genai-deepdetect.hf.space';
-export type GeneratorLabel =
-	| 'real'
-	| 'unknown_gan'
-	| 'stable_diffusion'
-	| 'midjourney'
-	| 'dall_e'
-	| 'flux'
-	| 'firefly'
-	| 'imagen';
-export interface EngineResult {
-	engine: string;
-	verdict: 'FAKE' | 'REAL' | 'UNKNOWN';
-	confidence: number;
-	attributed_generator: GeneratorLabel | null;
-	explanation: string;
-	processing_time_ms: number;
-}
-export interface DetectionResponse {
-	verdict: 'FAKE' | 'REAL' | 'UNKNOWN';
-	confidence: number;
-	attributed_generator: GeneratorLabel;
-	explanation: string;
-	processing_time_ms: number;
-	engine_breakdown: EngineResult[];
-}
-async function _post(endpoint: string, file: File): Promise<DetectionResponse> {
-	const form = new FormData();
-	form.append('file', file);
-	const res = await fetch(`${BASE_URL}${endpoint}`, {
-		method: 'POST',
-		body: form,
-	});
-	if (!res.ok) {
-		const err = await res.text();
-		throw new Error(`Detection failed (${res.status}): ${err}`);
-	}
-	return res.json();
-}
-export const detectImage = (file: File) => _post('/detect/image', file);
-export const detectVideo = (file: File) => _post('/detect/video', file);
-```
-Set in `frontend/.env.local`:
-```
-NEXT_PUBLIC_API_URL=https://your-username-genai-deepdetect.hf.space
-```
----
-## Dependencies — `requirements.txt`
-```
-# API
-fastapi>=0.111.0
-uvicorn[standard]>=0.29.0
-python-multipart>=0.0.9
-aiofiles>=23.2.1
-httpx>=0.27.0
-pydantic>=2.7.0
-# ML — fingerprint
-transformers>=4.40.0
-timm>=1.0.0
-torch>=2.1.0
-torchvision>=0.16.0
-# ML — coherence
-facenet-pytorch>=2.5.3
-mediapipe>=0.10.14
-opencv-python-headless>=4.9.0
-# ML — sstgnn
-torch-geometric>=2.5.0
-scipy>=1.13.0
-# Explainability — Gemini
-google-generativeai>=0.8.0
-# HuggingFace
-huggingface-hub>=0.23.0
-# RunPod serverless handler
-runpod>=1.6.0
-# Continual learning
-apscheduler>=3.10.4
-# Utils
-Pillow>=10.3.0
-numpy>=1.26.0
 ```
 ---
-## Bug Checklist — Fix Before Running
-### `src/types.py`
-- [ ] `EngineResult` missing `attributed_generator: Optional[str] = None` — add
-      it
-- [ ] `DetectionResponse.engine_breakdown` typed as `list[dict]` — change to
-      `list[EngineResult]`
-### `src/fusion/fuser.py`
-- [ ] `fuse()` returns 2-tuple — update to return 3-tuple
-      `(verdict, conf, generator)`
-- [ ] Update all callers in `main.py` accordingly
-### `src/explainability/explainer.py`
-- [ ] References `anthropic` SDK — replace entirely with Gemini implementation
-      above
-### `src/api/main.py`
-- [ ] Missing CORS middleware — add before deploy
-- [ ] Missing `@app.on_event("startup")` preload — add it
-- [ ] Missing `_extract_frames()` for video — add it
-- [ ] `detect_video` likely missing or stubbed — implement fully
-### `src/engines/*/` directories
-- [ ] All three engine files are stubs or empty — replace with full code above
-### `spaces/app.py`
-- [ ] Likely empty — add uvicorn entrypoint
-### `Dockerfile`
-- [ ] Check for `ffmpeg` and `libgl1-mesa-glx` — required for MediaPipe + OpenCV
-- [ ] Check `EXPOSE 7860` matches HF Spaces `app_port`
-### `src/services/inference_router.py`
-- [ ] Likely stub — implement `route_inference()` with RunPod httpx call
----
-## Code Standards
-- Lazy-load all models behind a threading lock — never load at module import
-- Wrap all model inference in `asyncio.to_thread()` — never block the event loop
-- Type hints on every function
-- `logging.getLogger(__name__)` not `print()`
-- `os.environ.get()` not hardcoded secrets
-- Pydantic `BaseModel` for all response schemas
-- Next.js: pages router only — no `app/` dir, no `src/` dir
-- Font: Plus Jakarta Sans or DM Sans — never Inter, Roboto, Arial
-- Border radius: 22% icon containers, 18px cards, 12px buttons
 ---
-## MCP Usage Rules
-Every coding session must follow these rules:
 ```
-1. Adding a dependency?
-   → context7: resolve-library-id <package>
-   → context7: query-docs <package> <specific feature>
-2. Using any HF model?
-   → huggingface: model_details <model-id>
-   → confirm size, license, task, input format
-3. Modifying engine logic?
-   → context7: query-docs transformers pipeline (fingerprint)
-   → context7: query-docs mediapipe face_mesh (coherence)
-   → context7: query-docs torch-geometric GCNConv (sstgnn)
-   → context7: query-docs facenet-pytorch (coherence embeddings)
-4. Modifying Gemini calls?
-   → context7: query-docs google-generativeai GenerativeModel
-5. Modifying RunPod handler?
-   → context7: query-docs runpod serverless handler
-6. Modifying FastAPI routes?
-   → context7: query-docs fastapi UploadFile
-7. Frontend API changes?
-   → context7: query-docs next.js pages-router fetch
-```
 ---
-## Friday Deploy Checklist
-```
-[ ] pip install -r requirements.txt  (no errors)
-[ ] src/types.py  — EngineResult has attributed_generator
-[ ] src/types.py  — DetectionResponse has engine_breakdown: list[EngineResult]
-[ ] src/fusion/fuser.py  — returns 3-tuple
-[ ] src/explainability/explainer.py  — uses Gemini, no anthropic import
-[ ] src/engines/fingerprint/engine.py  — full implementation
-[ ] src/engines/coherence/engine.py  — full implementation
-[ ] src/engines/sstgnn/engine.py  — full implementation
-[ ] src/api/main.py  — CORS + startup preload + video route
-[ ] src/services/inference_router.py  — RunPod httpx call
-[ ] runpod_handler.py  — added to project root
-[ ] spaces/app.py  — uvicorn entrypoint
-[ ] Dockerfile  — has ffmpeg, libgl1, EXPOSE 7860
-[ ] HF Space created + secrets set + pushed
-[ ] RunPod endpoint deployed + endpoint ID noted
-[ ] frontend/.env.local  — NEXT_PUBLIC_API_URL points to HF Space
-[ ] Vercel deploy of frontend/
-Smoke tests:
-[ ] GET /health → {"status":"ok"}
-[ ] POST /detect/image (real JPEG) → verdict REAL
-[ ] POST /detect/image (AI PNG)    → verdict FAKE
-[ ] POST /detect/video (MP4 <20MB) → response within 30s
-[ ] POST /detect/video (MP4 >20MB) → routes to RunPod
-```

+# GenAI-DeepDetect: Final Implementation PRD
+**Deadline: Tonight, 12:00 AM**
+**Deploy to: HuggingFace Spaces (Gradio)**
+**LLM: NVIDIA NIM free API (Llama-3.1-8B-Instruct)**
+**Everything else: HuggingFace pretrained models**
+**Only training needed: Module 3 (SSTGNN) on L40S (~5 hrs, ~$6)**
 ---
+## What You Are Building
+A Gradio app on HuggingFace Spaces that takes a video, runs 4 detection modules,
+fuses scores, calls NVIDIA NIM for a natural-language explanation, and returns:
+1. **FakeScore** (0-1, higher = more likely fake)
+2. **Per-module scores** (lip-sync, fingerprint, graph-GNN)
+3. **Generator attribution** (which AI tool made this)
+4. **Natural-language explanation** (from Llama via NVIDIA NIM)
 ---
+## Module Source Map
+| Module    | What                          | Source                                  | Weights                                     | Training?     |
+| --------- | ----------------------------- | --------------------------------------- | ------------------------------------------- | ------------- |
+| M1        | Lip-sync detection            | `github.com/AaronComo/LipFD`            | Official `ckpt.pth` from their Google Drive | NO            |
+| M2        | Deepfake binary + attribution | `yermandy/deepfake-detection` on HF     | Auto-downloads via transformers             | NO            |
+| M3        | Graph spatio-temporal GNN     | arXiv:2508.05526 (implement yourself)   | Train on L40S, push to HF Hub               | YES (~5 hrs)  |
+| M5-fusion | Score aggregation             | 3-input MLP                             | Train on CPU in 5 minutes                   | YES (trivial) |
+| M5-llm    | Explanation generation        | NVIDIA NIM `meta/llama-3.1-8b-instruct` | API call, no weights needed                 | NO            |
 ---
+## File Structure (copy this exactly)
 ```
+GenAI-DeepDetect/
+├── app.py                          # Gradio UI entry point
+├── requirements.txt
+├── packages.txt                    # system deps: ffmpeg, libsndfile1
+├── .env.example                    # NVIDIA_API_KEY placeholder
+│
+├── modules/
+│   ├── __init__.py
+│   ├── m1_lipsync.py              # LipFD pretrained wrapper
+│   ├── m2_fingerprint.py          # CLIP deepfake detector wrapper
+│   ├── m3_sstgnn.py               # SSTGNN inference (your trained model)
+│   ├── m5_fusion.py               # Attention MLP
+│   └── m5_explain.py              # NVIDIA NIM Llama API caller
+│
+├── utils/
+│   ├── video.py                   # Frame/audio extraction with ffmpeg
+│   └── graph.py                   # Spatial-patch graph builder for M3
+│
+├── weights/
+│   └── fusion_mlp.pt             # Tiny MLP (~12KB), committed to repo
+│
+├── test_assets/                   # 2 short clips for validation
+│   ├── real_sample.mp4
+│   └── fake_sample.mp4
+│
+└── README.md                      # HF Space model card
 ```
 ---
+## requirements.txt
+```
+torch>=2.1.0
+torchvision>=0.16.0
+torchaudio>=2.1.0
+torch-geometric>=2.4.0
+transformers>=4.36.0
+gradio>=4.0.0
+opencv-python-headless>=4.8.0
+librosa>=0.10.0
+numpy>=1.24.0
+Pillow>=10.0.0
+openai>=1.0.0
+huggingface-hub>=0.19.0
+soundfile>=0.12.0
+```
+## packages.txt
+```
+ffmpeg
+libsndfile1-dev
+```
 ---
+## Module 1: Lip-Sync (LipFD Pretrained)
+### What it does
+Takes video frames + audio, outputs a lip-sync coherence score. Higher score =
+more likely that lips don't match audio (fake).
+### Source
+- Repo: `https://github.com/AaronComo/LipFD`
+- Checkpoint: download `ckpt.pth` from their Google Drive link in the README
+- Re-upload to your HF Hub: `AkshatAgarwal/LipFD-checkpoint`
+### Setup (one-time)
+```bash
+# Clone LipFD repo
+git clone https://github.com/AaronComo/LipFD.git
+# Download their pretrained checkpoint (link in their README)
+# Then upload to your own HF repo so it auto-downloads in the Space
+huggingface-cli upload AkshatAgarwal/LipFD-checkpoint ckpt.pth .
+```
+### Implementation: modules/m1_lipsync.py
 ```python
+import torch
+import cv2
+import librosa
+import numpy as np
+from huggingface_hub import hf_hub_download
+class LipSyncModule:
+    """
+    LipFD pretrained lip-sync deepfake detector.
+    Source: github.com/AaronComo/LipFD (NeurIPS 2024)
+    Expected output: score in [0,1], higher = more likely fake
+    """
+    def __init__(self, cache_dir="/data/model_cache"):
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        self.cache_dir = cache_dir
+        self._load_model()
+    def _load_model(self):
+        ckpt_path = hf_hub_download(
+            repo_id="AkshatAgarwal/LipFD-checkpoint",
+            filename="ckpt.pth",
+            cache_dir=self.cache_dir
+        )
+        # Copy LipFD model definition files into modules/lipfd/
+        from modules.lipfd.model import LipFDNet
+        self.model = LipFDNet()
+        state_dict = torch.load(ckpt_path, map_location=self.device)
+        self.model.load_state_dict(state_dict)
+        self.model.to(self.device)
+        self.model.eval()
+    @torch.no_grad()
+    def score(self, video_path: str) -> dict:
+        frames, audio, fps = self._preprocess(video_path)
+        if frames is None or audio is None:
+            return {"s1": 0.5, "segments": [], "note": "no_face_or_audio"}
+        frames_t = torch.tensor(frames, dtype=torch.float32).to(self.device)
+        audio_t = torch.tensor(audio, dtype=torch.float32).to(self.device)
+        logits = self.model(frames_t, audio_t)
+        score = torch.sigmoid(logits).mean().item()
+        return {"s1": score, "segments": self._get_segments(logits, fps)}
+    def _preprocess(self, video_path: str):
+        cap = cv2.VideoCapture(video_path)
+        fps = cap.get(cv2.CAP_PROP_FPS)
+        frames = []
+        while cap.isOpened():
+            ret, frame = cap.read()
+            if not ret:
+                break
+            lip_crop = self._extract_lip_region(frame)
+            if lip_crop is not None:
+                lip_crop = cv2.resize(lip_crop, (96, 96))
+                frames.append(lip_crop)
+        cap.release()
+        if len(frames) < 5:
+            return None, None, fps
+        audio, sr = librosa.load(video_path, sr=16000)
+        mel = librosa.feature.melspectrogram(y=audio, sr=sr)
+        frames = np.array(frames).transpose(0, 3, 1, 2) / 255.0
+        return frames, mel, fps
+    def _extract_lip_region(self, frame):
+        face_cascade = cv2.CascadeClassifier(
+            cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
         )
+        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
+        faces = face_cascade.detectMultiScale(gray, 1.3, 5)
+        if len(faces) == 0:
+            return None
+        x, y, w, h = faces[0]
+        lip_y = y + int(h * 0.65)
+        lip_h = int(h * 0.35)
+        lip_x = x + int(w * 0.2)
+        lip_w = int(w * 0.6)
+        return frame[lip_y:lip_y+lip_h, lip_x:lip_x+lip_w]
+    def _get_segments(self, logits, fps):
+        scores = torch.sigmoid(logits).cpu().numpy()
+        segments = []
+        for i, s in enumerate(scores):
+            if s > 0.6:
+                segments.append({"time": round(i / fps, 2), "score": round(float(s), 3)})
+        return segments
 ```
 ---
+## Module 2: Style Fingerprinting (CLIP Pretrained)
+### Source
+- HuggingFace: `yermandy/deepfake-detection`
+- Auto-downloads, no manual setup
+### Implementation: modules/m2_fingerprint.py
 ```python
+import torch
+import cv2
 import numpy as np
+from transformers import (
+    AutoModelForImageClassification, AutoProcessor,
+    CLIPModel, CLIPTokenizer, CLIPProcessor
+)
 from PIL import Image
+GENERATORS = [
+    "Sora", "Runway Gen-2", "Wav2Lip",
+    "Stable Diffusion v1.5", "SDXL",
+    "Midjourney v6", "DALL-E 3", "Unknown/OOD"
+]
+class FingerprintModule:
+    def __init__(self, cache_dir="/data/model_cache"):
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        self.model = AutoModelForImageClassification.from_pretrained(
+            "yermandy/deepfake-detection", cache_dir=cache_dir
+        ).to(self.device)
+        self.processor = AutoProcessor.from_pretrained(
+            "yermandy/deepfake-detection", cache_dir=cache_dir
         )
+        self.model.eval()
+        self.clip = CLIPModel.from_pretrained(
+            "openai/clip-vit-large-patch14", cache_dir=cache_dir
+        ).to(self.device)
+        self.clip_tok = CLIPTokenizer.from_pretrained(
+            "openai/clip-vit-large-patch14", cache_dir=cache_dir
         )
+        self.clip_proc = CLIPProcessor.from_pretrained(
+            "openai/clip-vit-large-patch14", cache_dir=cache_dir
+        )
+        self.clip.eval()
+        self._precompute_generator_embeddings()
+    def _precompute_generator_embeddings(self):
+        prompts = [f"An image generated by {g} AI model" for g in GENERATORS]
+        tokens = self.clip_tok(prompts, padding=True, return_tensors="pt")
+        tokens = {k: v.to(self.device) for k, v in tokens.items()}
+        with torch.no_grad():
+            self.gen_embeds = self.clip.get_text_features(**tokens)
+            self.gen_embeds = self.gen_embeds / self.gen_embeds.norm(dim=-1, keepdim=True)
+    @torch.no_grad()
+    def score(self, video_path: str) -> dict:
+        frames = self._extract_frames(video_path, n=16)
+        if not frames:
+            return {"s2": 0.5, "attribution": {}, "top_generator": "Unknown"}
+        fake_scores = []
         for frame in frames:
+            inputs = self.processor(images=frame, return_tensors="pt")
+            inputs = {k: v.to(self.device) for k, v in inputs.items()}
+            logits = self.model(**inputs).logits
+            prob = torch.softmax(logits, dim=-1)
+            fake_prob = prob[0][1].item() if prob.shape[-1] > 1 else prob[0][0].item()
+            fake_scores.append(fake_prob)
+        s2 = sum(fake_scores) / len(fake_scores)
+        attribution = self._attribute(frames) if s2 > 0.5 else {}
+        top_gen = max(attribution, key=attribution.get) if attribution else "Unknown"
+        return {"s2": s2, "attribution": attribution, "top_generator": top_gen}
+    def _attribute(self, frames: list) -> dict:
+        img_embeds = []
+        for frame in frames[:8]:
+            inputs = self.clip_proc(images=frame, return_tensors="pt")
+            inputs = {k: v.to(self.device) for k, v in inputs.items()}
+            embed = self.clip.get_image_features(**inputs)
+            embed = embed / embed.norm(dim=-1, keepdim=True)
+            img_embeds.append(embed)
+        avg_embed = torch.cat(img_embeds).mean(dim=0, keepdim=True)
+        sims = (avg_embed @ self.gen_embeds.T).squeeze()
+        probs = torch.softmax(sims * 10, dim=-1)
+        return {GENERATORS[i]: round(probs[i].item(), 4) for i in range(len(GENERATORS))}
+    def _extract_frames(self, video_path: str, n: int = 16) -> list:
+        cap = cv2.VideoCapture(video_path)
+        total = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+        indices = np.linspace(0, max(total-1, 0), n, dtype=int) if total > 0 else []
+        frames = []
+        for idx in indices:
+            cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
+            ret, frame = cap.read()
+            if ret:
+                frames.append(Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)))
+        cap.release()
+        return frames
 ```
 ---
+## Module 3: SSTGNN (Train Once on L40S, Deploy from HF Hub)
+### SSTGNN Architecture: modules/sstgnn_model.py
 ```python
+import torch
+import torch.nn as nn
+from torch_geometric.nn import global_mean_pool
+from torch_geometric.utils import degree
+class SpectralFilterLayer(nn.Module):
+    def __init__(self, in_ch, out_ch, K=3):
+        super().__init__()
+        self.coeffs = nn.ParameterList([
+            nn.Parameter(torch.randn(in_ch, out_ch) * 0.01) for _ in range(K)
+        ])
+        self.K = K
+    def forward(self, x, edge_index):
+        out = x @ self.coeffs[0]
+        x_k = x
+        for k in range(1, self.K):
+            row, col = edge_index
+            deg = degree(col, x.size(0), dtype=x.dtype).clamp(min=1)
+            norm = deg.pow(-0.5)
+            aggr = torch.zeros_like(x)
+            aggr.index_add_(0, col, norm[col].unsqueeze(-1) * x_k[row] * norm[row].unsqueeze(-1))
+            x_k = aggr
+            out = out + x_k @ self.coeffs[k]
+        return torch.relu(out)
+class TemporalDiffModule(nn.Module):
+    def __init__(self, T, out_dim=32):
+        super().__init__()
+        self.proj = nn.Linear(T, out_dim)
+    def forward(self, x_seq):
+        fft = torch.fft.fft(x_seq, dim=1).abs()
+        fft_pooled = fft.mean(dim=-1)
+        return self.proj(fft_pooled)
+class SSTGNN(nn.Module):
+    def __init__(self, patch_feat_dim=8, hidden_dim=128, num_frames=32,
+                 num_spectral_layers=3, spectral_K=3, fft_dim=32):
+        super().__init__()
+        self.input_proj = nn.Linear(patch_feat_dim + fft_dim, hidden_dim)
+        self.spectral_layers = nn.ModuleList([
+            SpectralFilterLayer(hidden_dim, hidden_dim, K=spectral_K)
+            for _ in range(num_spectral_layers)
+        ])
+        self.temporal = TemporalDiffModule(T=num_frames, out_dim=fft_dim)
+        self.classifier = nn.Sequential(
+            nn.Linear(hidden_dim, 64), nn.ReLU(),
+            nn.Dropout(0.3), nn.Linear(64, 1)
         )
+    def forward(self, data):
+        fft_feat = self.temporal(data.x_temporal)
+        x = torch.cat([data.x, fft_feat], dim=-1)
+        x = self.input_proj(x)
+        for layer in self.spectral_layers:
+            x = layer(x, data.edge_index) + x
+        x = global_mean_pool(x, data.batch)
+        return self.classifier(x).squeeze(-1)
 ```
+### Graph Builder: utils/graph.py
 ```python
+import torch, cv2, numpy as np
+from torch_geometric.data import Data
+def video_to_graph(video_path: str, patch_size=16, max_frames=32):
+    cap = cv2.VideoCapture(video_path)
     total = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+    indices = np.linspace(0, max(total-1, 0), max_frames, dtype=int)
+    all_patches = []
+    for idx in indices:
+        cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
         ret, frame = cap.read()
         if not ret:
             break
+        frame = cv2.resize(frame, (224, 224)).astype(np.float32) / 255.0
+        n_h, n_w = 224 // patch_size, 224 // patch_size
+        frame_patches = []
+        for i in range(n_h):
+            for j in range(n_w):
+                patch = frame[i*patch_size:(i+1)*patch_size, j*patch_size:(j+1)*patch_size]
+                feat = np.concatenate([patch.mean(axis=(0,1)), patch.std(axis=(0,1)), [i/n_h, j/n_w]])
+                frame_patches.append(feat)
+        all_patches.append(frame_patches)
     cap.release()
+    T = len(all_patches)
+    n_h, n_w = 224 // patch_size, 224 // patch_size
+    n_patches = n_h * n_w
+    x = torch.tensor(np.array(all_patches).reshape(-1, 8), dtype=torch.float32)
+    edges = []
+    for t in range(T):
+        off = t * n_patches
+        for i in range(n_h):
+            for j in range(n_w):
+                nid = off + i * n_w + j
+                if j+1 < n_w:
+                    edges += [[nid, off+i*n_w+j+1], [off+i*n_w+j+1, nid]]
+                if i+1 < n_h:
+                    edges += [[nid, off+(i+1)*n_w+j], [off+(i+1)*n_w+j, nid]]
+                if t+1 < T:
+                    nn = (t+1)*n_patches + i*n_w + j
+                    edges += [[nid, nn], [nn, nid]]
+    edge_index = torch.tensor(edges, dtype=torch.long).T
+    x_temporal = torch.tensor(np.array(all_patches), dtype=torch.float32).permute(1, 0, 2)
+    return Data(x=x, edge_index=edge_index, x_temporal=x_temporal)
 ```
+### Inference Wrapper: modules/m3_sstgnn.py
 ```python
+import torch
+from huggingface_hub import hf_hub_download
+from modules.sstgnn_model import SSTGNN
+from utils.graph import video_to_graph
+from torch_geometric.data import Batch
+class SSTGNNModule:
+    def __init__(self, cache_dir="/data/model_cache"):
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        ckpt_path = hf_hub_download(
+            repo_id="AkshatAgarwal/SSTGNN-deepfake",
+            filename="sstgnn_best.pt", cache_dir=cache_dir
         )
+        self.model = SSTGNN(patch_feat_dim=8, hidden_dim=128, num_frames=32)
+        self.model.load_state_dict(torch.load(ckpt_path, map_location=self.device))
+        self.model.to(self.device)
+        self.model.eval()
+    @torch.no_grad()
+    def score(self, video_path: str) -> dict:
+        if torch.cuda.is_available():
+            torch.cuda.reset_peak_memory_stats()
+        graph = video_to_graph(video_path, patch_size=16, max_frames=32)
+        batch = Batch.from_data_list([graph.to(self.device)])
+        logits = self.model(batch)
+        s3 = torch.sigmoid(logits).item()
+        vram = torch.cuda.max_memory_allocated() // (1024*1024) if torch.cuda.is_available() else 0
+        return {"s3": s3, "vram_mb": vram}
 ```
+### FALLBACK (if M3 not trained yet): modules/m3_fallback.py
 ```python
+from transformers import AutoModelForImageClassification, AutoProcessor
+import torch, cv2, numpy as np
 from PIL import Image
+class SSTGNNModule:
+    """Drop-in ViT fallback. Replace with real SSTGNN once trained."""
+    def __init__(self, cache_dir="/data/model_cache"):
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        self.model = AutoModelForImageClassification.from_pretrained(
+            "prithivMLmods/Deep-Fake-Detector-v2-Model", cache_dir=cache_dir
+        ).to(self.device)
+        self.processor = AutoProcessor.from_pretrained(
+            "prithivMLmods/Deep-Fake-Detector-v2-Model", cache_dir=cache_dir
+        )
+        self.model.eval()
+    @torch.no_grad()
+    def score(self, video_path: str) -> dict:
+        cap = cv2.VideoCapture(video_path)
+        total = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+        indices = np.linspace(0, max(total-1,0), 16, dtype=int)
+        scores = []
+        for idx in indices:
+            cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
+            ret, frame = cap.read()
+            if ret:
+                img = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
+                inputs = self.processor(images=img, return_tensors="pt")
+                inputs = {k: v.to(self.device) for k, v in inputs.items()}
+                logits = self.model(**inputs).logits
+                prob = torch.softmax(logits, dim=-1)
+                scores.append(prob[0][1].item() if prob.shape[-1] > 1 else prob[0][0].item())
+        cap.release()
+        return {"s3": sum(scores)/len(scores) if scores else 0.5, "vram_mb": 0}
 ```
 ---
+## Module 5: Fusion MLP + NVIDIA NIM Explanation
+### modules/m5_fusion.py
 ```python
+import torch, torch.nn as nn, os
+class FusionMLP(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.fc1 = nn.Linear(3, 16)
+        self.fc2 = nn.Linear(16, 3)
+    def forward(self, s: torch.Tensor) -> tuple:
+        h = torch.relu(self.fc1(s))
+        alpha = torch.softmax(self.fc2(h), dim=-1)
+        return (alpha * s).sum(), alpha
+class FusionModule:
+    def __init__(self, weights_path="weights/fusion_mlp.pt"):
+        self.model = FusionMLP()
+        if os.path.exists(weights_path):
+            self.model.load_state_dict(torch.load(weights_path, map_location="cpu"))
+        self.model.eval()
+    def fuse(self, s1: float, s2: float, s3: float) -> dict:
+        s = torch.tensor([s1, s2, s3])
+        with torch.no_grad():
+            fakescore, alpha = self.model(s)
+        return {
+            "FakeScore": round(fakescore.item(), 4),
+            "weights": {
+                "lip_sync": round(alpha[0].item(), 3),
+                "fingerprint": round(alpha[1].item(), 3),
+                "graph_gnn": round(alpha[2].item(), 3),
+            }
+        }
 ```
+### modules/m5_explain.py (NVIDIA NIM)
+```python
+import os
+from openai import OpenAI
+class ExplainModule:
+    """
+    NVIDIA NIM free API: meta/llama-3.1-8b-instruct
+    Endpoint: https://integrate.api.nvidia.com/v1
+    Rate limit: ~40 req/min (free, no credit card)
+    """
+    def __init__(self):
+        self.client = OpenAI(
+            api_key=os.environ.get("NVIDIA_API_KEY", ""),
+            base_url="https://integrate.api.nvidia.com/v1"
+        )
+        self.model = "meta/llama-3.1-8b-instruct"
+    def explain(self, fakescore, s1, s2, s3, weights, attribution, segments, top_generator) -> str:
+        verdict = "FAKE" if fakescore > 0.5 else "REAL"
+        confidence = "high" if abs(fakescore-0.5) > 0.3 else "moderate" if abs(fakescore-0.5) > 0.15 else "low"
+        seg_text = ""
+        if segments:
+            seg_text = "Flagged timestamps: " + ", ".join(
+                [f"{s['time']}s (score={s['score']})" for s in segments[:5]]
+            )
+        attr_text = ""
+        if attribution:
+            top3 = sorted(attribution.items(), key=lambda x: -x[1])[:3]
+            attr_text = "Top generators: " + ", ".join([f"{n}: {p*100:.1f}%" for n, p in top3])
+        prompt = f"""You are a forensic AI analyst. Analyze these deepfake detection results. Be specific about evidence.
+Results:
+- Verdict: {verdict} (FakeScore: {fakescore:.3f}, confidence: {confidence})
+- Lip-Sync (M1): {s1:.3f} (weight: {weights.get('lip_sync', 'N/A')})
+- Fingerprint (M2): {s2:.3f} (weight: {weights.get('fingerprint', 'N/A')})
+- Graph-GNN (M3): {s3:.3f} (weight: {weights.get('graph_gnn', 'N/A')})
+{seg_text}
+{attr_text}
+- Most likely generator: {top_generator}
+Write 3-5 sentences. Reference specific scores and timestamps."""
+        try:
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=[
+                    {"role": "system", "content": "You are a forensic deepfake analyst. Be precise."},
+                    {"role": "user", "content": prompt}
+                ],
+                max_tokens=300, temperature=0.3
+            )
+            return response.choices[0].message.content.strip()
+        except Exception as e:
+            return self._fallback(verdict, fakescore, s1, s2, s3, top_generator, confidence)
+    def _fallback(self, verdict, fakescore, s1, s2, s3, top_gen, conf):
+        if verdict == "FAKE":
+            return (
+                f"Video classified as {verdict} with {conf} confidence (FakeScore: {fakescore:.3f}). "
+                f"Lip-sync scored {s1:.2f}, indicating "
+                f"{'significant' if s1>0.7 else 'moderate' if s1>0.5 else 'minimal'} audio-visual inconsistency. "
+                f"Style fingerprinting scored {s2:.2f}, top attribution: {top_gen}. "
+                f"Graph analysis scored {s3:.2f}."
+            )
+        return (
+            f"Video classified as {verdict} with {conf} confidence (FakeScore: {fakescore:.3f}). "
+            f"All modules returned scores below detection threshold."
+        )
 ```
 ---
+## Main App: app.py
+```python
+import gradio as gr
+import torch, time, os
+from modules.m1_lipsync import LipSyncModule
+from modules.m2_fingerprint import FingerprintModule
+# Use m3_fallback if SSTGNN not trained yet, otherwise m3_sstgnn
+from modules.m3_fallback import SSTGNNModule  # SWAP when trained
+from modules.m5_fusion import FusionModule
+from modules.m5_explain import ExplainModule
+CACHE = "/data/model_cache" if os.path.exists("/data") else "./cache"
+os.makedirs(CACHE, exist_ok=True)
+print("Loading modules...")
+m1 = LipSyncModule(cache_dir=CACHE)
+m2 = FingerprintModule(cache_dir=CACHE)
+m3 = SSTGNNModule(cache_dir=CACHE)
+m5_fusion = FusionModule(weights_path="weights/fusion_mlp.pt")
+m5_explain = ExplainModule()
+print("Ready!")
+def analyze(video_file):
+    if video_file is None:
+        return "Upload a video.", "", "", ""
+    start = time.time()
+    r1 = m1.score(video_file)
+    r2 = m2.score(video_file)
+    r3 = m3.score(video_file)
+    fusion = m5_fusion.fuse(r1["s1"], r2["s2"], r3["s3"])
+    explanation = m5_explain.explain(
+        fakescore=fusion["FakeScore"],
+        s1=r1["s1"], s2=r2["s2"], s3=r3["s3"],
+        weights=fusion["weights"],
+        attribution=r2["attribution"],
+        segments=r1.get("segments", []),
+        top_generator=r2["top_generator"]
+    )
+    elapsed = time.time() - start
+    verdict = "FAKE" if fusion["FakeScore"] > 0.5 else "REAL"
+    icon = "🔴" if verdict == "FAKE" else "🟢"
+    verdict_text = f"{icon} **{verdict}** (FakeScore: {fusion['FakeScore']:.3f})"
+    scores_text = f"""**Per-Module Scores:**
+- Lip-Sync (M1): {r1['s1']:.3f} [weight: {fusion['weights']['lip_sync']:.2f}]
+- Fingerprint (M2): {r2['s2']:.3f} [weight: {fusion['weights']['fingerprint']:.2f}]
+- Graph-GNN (M3): {r3['s3']:.3f} [weight: {fusion['weights']['graph_gnn']:.2f}]
+**Time:** {elapsed:.1f}s"""
+    attr_text = "**Generator Attribution:**\n"
+    if r2["attribution"]:
+        for gen, prob in sorted(r2["attribution"].items(), key=lambda x: -x[1]):
+            bar = "█" * int(prob * 30)
+            attr_text += f"- {gen}: {prob*100:.1f}% {bar}\n"
+    else:
+        attr_text += "- N/A (classified as real)"
+    return verdict_text, scores_text, attr_text, explanation
+with gr.Blocks(title="GenAI-DeepDetect", theme=gr.themes.Base(primary_hue="red", font=["DM Sans","sans-serif"])) as demo:
+    gr.Markdown("# GenAI-DeepDetect\n### Multimodal Deepfake Detection and Attribution\n**Modules:** LipFD | CLIP Detector | SSTGNN | Llama-3.1-8B via NVIDIA NIM")
+    with gr.Row():
+        with gr.Column(scale=1):
+            vid = gr.Video(label="Upload Video", height=300)
+            btn = gr.Button("Analyze", variant="primary", size="lg")
+        with gr.Column(scale=2):
+            v_out = gr.Markdown(label="Verdict")
+            s_out = gr.Markdown(label="Scores")
+    with gr.Row():
+        a_out = gr.Markdown(label="Attribution")
+        e_out = gr.Markdown(label="Explanation")
+    btn.click(fn=analyze, inputs=[vid], outputs=[v_out, s_out, a_out, e_out])
+    gr.Markdown("---\n**Paper:** GenAI-DeepDetect | **Authors:** Akshat Agarwal, Dev Chopda | SRM IST")
+if __name__ == "__main__":
+    demo.launch()
 ```
 ---
+## Environment Secrets (HF Space Settings)
+| Key              | Value       | Source                         |
+| ---------------- | ----------- | ------------------------------ |
+| `NVIDIA_API_KEY` | `nvapi-...` | build.nvidia.com (free signup) |
+| `HF_TOKEN`       | `hf_...`    | huggingface.co/settings/tokens |
 ---
+## NVIDIA NIM Quick Reference
+```python
+from openai import OpenAI
+client = OpenAI(api_key="nvapi-YOUR-KEY", base_url="https://integrate.api.nvidia.com/v1")
+r = client.chat.completions.create(
+    model="meta/llama-3.1-8b-instruct",
+    messages=[{"role":"user","content":"Hello"}], max_tokens=300
+)
+print(r.choices[0].message.content)
 ```
+---
+## Tonight's Timeline
+| Time      | Task                                                  | Duration |
+| --------- | ----------------------------------------------------- | -------- |
+| NOW       | Create HF Space + add NVIDIA_API_KEY secret           | 15 min   |
+| +0:15     | Clone LipFD, upload checkpoint to HF Hub              | 30 min   |
+| +0:45     | Push file structure + requirements.txt                | 15 min   |
+| +1:00     | Wire M1 + M2 + M3 fallback, test each independently   | 45 min   |
+| +1:45     | Wire M5 fusion (equal weights) + NVIDIA NIM explainer | 30 min   |
+| +2:15     | Wire app.py, test full pipeline end-to-end            | 30 min   |
+| +2:45     | Fix bugs, adjust, test edge cases                     | 45 min   |
+| +3:30     | README.md, push final                                 | 15 min   |
+| +3:45     | Collect scores, train MLP, push fusion weights        | 15 min   |
+| **+4:00** | **DONE**                                              |          |
 ---
+## Swap Guide: When SSTGNN Is Trained
+1. Train on L40S using the training script in CLAUDE.md
+2. Push weights:
+   `huggingface-cli upload AkshatAgarwal/SSTGNN-deepfake sstgnn_best.pt .`
+3. In app.py, change: `from modules.m3_fallback import SSTGNNModule` to
+   `from modules.m3_sstgnn import SSTGNNModule`
+4. Commit and push. Done.

app.py ADDED Viewed

	@@ -0,0 +1,104 @@

+from __future__ import annotations
+import os
+import time
+from pathlib import Path
+import gradio as gr
+from modules.m1_lipsync import LipSyncModule
+from modules.m2_fingerprint import FingerprintModule
+from modules.m3_fallback import SSTGNNModule
+from modules.m5_explain import ExplainModule
+from modules.m5_fusion import FusionModule
+CACHE = "/data/model_cache" if os.path.exists("/data") else "./cache"
+os.makedirs(CACHE, exist_ok=True)
+os.environ.setdefault("MODEL_CACHE_DIR", CACHE)
+os.environ.setdefault("INFERENCE_BACKEND", "local")
+os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
+m1 = LipSyncModule(cache_dir=CACHE)
+m2 = FingerprintModule(cache_dir=CACHE)
+m3 = SSTGNNModule(cache_dir=CACHE)
+m5_fusion = FusionModule(weights_path="weights/fusion_mlp.pt")
+m5_explain = ExplainModule()
+def analyze(video_file: str | None):
+    if not video_file:
+        return "Upload a video.", "", "", ""
+    start = time.time()
+    r1 = m1.score(video_file)
+    r2 = m2.score(video_file)
+    r3 = m3.score(video_file)
+    fusion = m5_fusion.fuse(r1["s1"], r2["s2"], r3["s3"])
+    explanation = m5_explain.explain(
+        fakescore=fusion["FakeScore"],
+        s1=r1["s1"],
+        s2=r2["s2"],
+        s3=r3["s3"],
+        weights=fusion["weights"],
+        attribution=r2["attribution"],
+        segments=r1.get("segments", []),
+        top_generator=r2["top_generator"],
+    )
+    elapsed = time.time() - start
+    verdict = "FAKE" if fusion["FakeScore"] > 0.5 else "REAL"
+    verdict_text = f"**{verdict}** (FakeScore: {fusion['FakeScore']:.3f})"
+    scores_text = (
+        "**Per-Module Scores:**\n"
+        f"- Lip-Sync (M1): {r1['s1']:.3f} [weight: {fusion['weights']['lip_sync']:.2f}]\n"
+        f"- Fingerprint (M2): {r2['s2']:.3f} [weight: {fusion['weights']['fingerprint']:.2f}]\n"
+        f"- Graph-GNN (M3): {r3['s3']:.3f} [weight: {fusion['weights']['graph_gnn']:.2f}]\n\n"
+        f"**Time:** {elapsed:.1f}s"
+    )
+    attr_text = "**Generator Attribution:**\n"
+    if r2["attribution"]:
+        for gen, prob in sorted(r2["attribution"].items(), key=lambda item: -item[1]):
+            attr_text += f"- {gen}: {prob * 100:.1f}%\n"
+    else:
+        attr_text += "- N/A (classified as real)"
+    return verdict_text, scores_text, attr_text, explanation
+with gr.Blocks(title="GenAI-DeepDetect") as demo:
+    gr.Markdown(
+        "# GenAI-DeepDetect\n"
+        "### Multimodal Deepfake Detection and Attribution\n"
+        "**Modules:** LipFD | CLIP Detector | SSTGNN | NVIDIA NIM"
+    )
+    with gr.Row():
+        with gr.Column(scale=1):
+            video = gr.Video(label="Upload Video", height=300, type="filepath")
+            button = gr.Button("Analyze", variant="primary")
+        with gr.Column(scale=2):
+            verdict_out = gr.Markdown(label="Verdict")
+            scores_out = gr.Markdown(label="Scores")
+    with gr.Row():
+        attribution_out = gr.Markdown(label="Attribution")
+        explanation_out = gr.Markdown(label="Explanation")
+    button.click(
+        fn=analyze,
+        inputs=[video],
+        outputs=[verdict_out, scores_out, attribution_out, explanation_out],
+    )
+if __name__ == "__main__":
+    demo.launch(
+        server_name="0.0.0.0",
+        server_port=int(os.environ.get("PORT", "7860")),
+    )

modules/__init__.py ADDED Viewed

	@@ -0,0 +1,16 @@

+from modules.m1_lipsync import LipSyncModule
+from modules.m2_fingerprint import FingerprintModule
+from modules.m3_fallback import SSTGNNModule as FallbackSSTGNNModule
+from modules.m3_sstgnn import SSTGNNModule
+from modules.m5_explain import ExplainModule
+from modules.m5_fusion import FusionModule
+__all__ = [
+    "ExplainModule",
+    "FallbackSSTGNNModule",
+    "FingerprintModule",
+    "FusionModule",
+    "LipSyncModule",
+    "SSTGNNModule",
+]

modules/m1_lipsync.py ADDED Viewed

	@@ -0,0 +1,35 @@

+from __future__ import annotations
+import os
+from src.engines.coherence.engine import CoherenceEngine
+from src.services.media_utils import extract_video_frames
+class LipSyncModule:
+    def __init__(self, cache_dir: str = "/data/model_cache"):
+        os.environ.setdefault("MODEL_CACHE_DIR", cache_dir)
+        self.engine = CoherenceEngine()
+    def score(self, video_path: str) -> dict:
+        frames = extract_video_frames(video_path, max_frames=60)
+        if not frames:
+            return {"s1": 0.5, "segments": [], "note": "no_frames"}
+        result = self.engine.run_video(frames, video_path)
+        segments = []
+        for marker in result.timestamp_markers[:5]:
+            correlation = float(marker.get("correlation", 0.0))
+            segments.append(
+                {
+                    "time": round(float(marker.get("start_s", 0.0)), 2),
+                    "score": round(max(0.0, min(1.0, 1.0 - correlation)), 3),
+                }
+            )
+        return {
+            "s1": round(float(result.confidence), 4),
+            "segments": segments,
+            "note": result.explanation,
+        }

modules/m2_fingerprint.py ADDED Viewed

	@@ -0,0 +1,44 @@

+from __future__ import annotations
+import os
+from src.engines.fingerprint.engine import FingerprintEngine
+from src.services.media_utils import extract_video_frames
+_DISPLAY_NAMES = {
+    "real": "Real",
+    "sora": "Sora",
+    "runway": "Runway Gen-2",
+    "wav2lip": "Wav2Lip",
+    "stable_diffusion": "Stable Diffusion v1.5",
+    "sdxl": "SDXL",
+    "midjourney": "Midjourney v6",
+    "dall_e": "DALL-E 3",
+    "unknown_generative": "Unknown/OOD",
+}
+class FingerprintModule:
+    def __init__(self, cache_dir: str = "/data/model_cache"):
+        os.environ.setdefault("MODEL_CACHE_DIR", cache_dir)
+        self.engine = FingerprintEngine()
+    def score(self, video_path: str) -> dict:
+        frames = extract_video_frames(video_path, max_frames=60)
+        if not frames:
+            return {"s2": 0.5, "attribution": {}, "top_generator": "Unknown/OOD"}
+        result = self.engine.run_video(frames)
+        generator = result.attributed_generator or "unknown_generative"
+        top_generator = _DISPLAY_NAMES.get(generator, generator)
+        attribution = {}
+        if result.confidence > 0.5:
+            attribution[top_generator] = 1.0
+        return {
+            "s2": round(float(result.confidence), 4),
+            "attribution": attribution,
+            "top_generator": top_generator,
+        }

modules/m3_fallback.py ADDED Viewed

	@@ -0,0 +1,21 @@

+from __future__ import annotations
+import os
+from src.engines.sstgnn.engine import SSTGNNEngine
+from src.services.media_utils import extract_video_frames
+class SSTGNNModule:
+    def __init__(self, cache_dir: str = "/data/model_cache"):
+        os.environ.setdefault("MODEL_CACHE_DIR", cache_dir)
+        self.engine = SSTGNNEngine()
+    def score(self, video_path: str) -> dict:
+        frames = extract_video_frames(video_path, max_frames=60)
+        if not frames:
+            return {"s3": 0.5, "vram_mb": 0}
+        result = self.engine.run_video(frames)
+        return {"s3": round(float(result.confidence), 4), "vram_mb": 0}

modules/m3_sstgnn.py ADDED Viewed

	@@ -0,0 +1,4 @@


1	+ from modules.m3_fallback import SSTGNNModule
2	+
3	+ __all__ = ["SSTGNNModule"]
4	+

modules/m5_explain.py ADDED Viewed

	@@ -0,0 +1,74 @@

+from __future__ import annotations
+from src.explainability.explainer import explain
+from src.types import EngineResult
+_GENERATOR_NAMES = {
+    "Real": "real",
+    "Sora": "sora",
+    "Runway Gen-2": "runway",
+    "Wav2Lip": "wav2lip",
+    "Stable Diffusion v1.5": "stable_diffusion",
+    "SDXL": "sdxl",
+    "Midjourney v6": "midjourney",
+    "DALL-E 3": "dall_e",
+    "Unknown/OOD": "unknown_generative",
+}
+class ExplainModule:
+    def explain(
+        self,
+        fakescore: float,
+        s1: float,
+        s2: float,
+        s3: float,
+        weights: dict,
+        attribution: dict,
+        segments: list,
+        top_generator: str,
+    ) -> str:
+        seg_text = "none"
+        if segments:
+            seg_text = ", ".join(
+                f"{segment['time']}s ({segment['score']:.2f})" for segment in segments[:5]
+            )
+        attr_text = "none"
+        if attribution:
+            attr_text = ", ".join(
+                f"{name}: {prob * 100:.1f}%" for name, prob in attribution.items()
+            )
+        engine_results = [
+            EngineResult(
+                engine="lip_sync",
+                verdict="FAKE" if s1 > 0.5 else "REAL",
+                confidence=s1,
+                explanation=(
+                    f"Weight {weights.get('lip_sync', 0.0):.2f}. "
+                    f"Flagged timestamps: {seg_text}."
+                ),
+            ),
+            EngineResult(
+                engine="fingerprint",
+                verdict="FAKE" if s2 > 0.5 else "REAL",
+                confidence=s2,
+                attributed_generator=_GENERATOR_NAMES.get(top_generator, "unknown_generative"),
+                explanation=(
+                    f"Weight {weights.get('fingerprint', 0.0):.2f}. "
+                    f"Attribution: {attr_text}."
+                ),
+            ),
+            EngineResult(
+                engine="graph_gnn",
+                verdict="FAKE" if s3 > 0.5 else "REAL",
+                confidence=s3,
+                explanation=f"Weight {weights.get('graph_gnn', 0.0):.2f}.",
+            ),
+        ]
+        verdict = "FAKE" if fakescore > 0.5 else "REAL"
+        generator = _GENERATOR_NAMES.get(top_generator, "unknown_generative")
+        return explain(verdict, fakescore, engine_results, generator)

modules/m5_fusion.py ADDED Viewed

	@@ -0,0 +1,40 @@

+from __future__ import annotations
+import os
+import torch
+import torch.nn as nn
+class FusionMLP(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.fc1 = nn.Linear(3, 16)
+        self.fc2 = nn.Linear(16, 3)
+    def forward(self, scores: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
+        hidden = torch.relu(self.fc1(scores))
+        alpha = torch.softmax(self.fc2(hidden), dim=-1)
+        return (alpha * scores).sum(), alpha
+class FusionModule:
+    def __init__(self, weights_path: str = "weights/fusion_mlp.pt"):
+        self.model = FusionMLP()
+        if os.path.exists(weights_path):
+            self.model.load_state_dict(torch.load(weights_path, map_location="cpu"))
+        self.model.eval()
+    def fuse(self, s1: float, s2: float, s3: float) -> dict:
+        scores = torch.tensor([s1, s2, s3], dtype=torch.float32)
+        with torch.no_grad():
+            fakescore, alpha = self.model(scores)
+        return {
+            "FakeScore": round(float(fakescore.item()), 4),
+            "weights": {
+                "lip_sync": round(float(alpha[0].item()), 3),
+                "fingerprint": round(float(alpha[1].item()), 3),
+                "graph_gnn": round(float(alpha[2].item()), 3),
+            },
+        }

packages.txt ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ ffmpeg
2	+ libsndfile1-dev
3	+

requirements.txt CHANGED Viewed

@@ -6,6 +6,7 @@ aiofiles>=23.2.1
 httpx>=0.27.0
 pydantic>=2.7.0
 python-dotenv>=1.0.1
 # ML - fingerprint
 transformers>=4.40.0
@@ -15,8 +16,10 @@ torchvision>=0.21.0
 torchaudio>=2.6.0
 # ML - coherence
-# facenet-pytorch currently has limited support on newer Python versions.
-facenet-pytorch>=2.5.3; python_version < "3.13"
 mediapipe>=0.10.14
 opencv-python-headless>=4.9.0
 librosa>=0.10.2
@@ -25,9 +28,8 @@ librosa>=0.10.2
 torch-geometric>=2.5.0
 scipy>=1.13.0
-# Explainability - Gemini
-google-genai>=1.0.0
-google-generativeai>=0.8.0
 # HuggingFace
 huggingface-hub>=0.23.0

 httpx>=0.27.0
 pydantic>=2.7.0
 python-dotenv>=1.0.1
+gradio>=4.0.0
 # ML - fingerprint
 transformers>=4.40.0
 torchaudio>=2.6.0
 # ML - coherence
+# facenet-pytorch requires numpy<2.0 which cannot build on Python 3.14+.
+# On Python 3.14+ the engine automatically falls back to torchvision ResNet-18.
+# Use Python <=3.12 in production for full facenet-pytorch support.
+facenet-pytorch>=2.5.3; python_version < "3.14"
 mediapipe>=0.10.14
 opencv-python-headless>=4.9.0
 librosa>=0.10.2
 torch-geometric>=2.5.0
 scipy>=1.13.0
+# Explainability - NVIDIA NIM
+openai>=1.0.0
 # HuggingFace
 huggingface-hub>=0.23.0

runpod_handler.py CHANGED Viewed

@@ -46,13 +46,12 @@ def handler(job: dict) -> dict:
             tmp_path = temp.name
         try:
-            frames = extract_video_frames(tmp_path, max_frames=300)
         finally:
             os.unlink(tmp_path)
-        fp = _fp.run_video(frames)
-        co = _co.run_video(frames)
-        st = _st.run_video(frames)
         verdict, conf, generator = fuse([fp, co, st], is_video=True)
     engine_results = [fp, co, st]

             tmp_path = temp.name
         try:
+            frames = extract_video_frames(tmp_path, max_frames=60)
+            fp = _fp.run_video(frames)
+            co = _co.run_video(frames, tmp_path)  # keep alive for audio lip-sync analysis
+            st = _st.run_video(frames)
         finally:
             os.unlink(tmp_path)
         verdict, conf, generator = fuse([fp, co, st], is_video=True)
     engine_results = [fp, co, st]

src/api/main.py CHANGED Viewed

@@ -244,7 +244,8 @@ def _model_inventory() -> dict[str, object]:
             "graph_component": "scipy.spatial.Delaunay + MediaPipe landmarks",
         },
         "explainability": {
-            "gemini_model_candidates": list(MODEL_CANDIDATES),
         },
         "generator_labels": SUPPORTED_GENERATORS,
     }

             "graph_component": "scipy.spatial.Delaunay + MediaPipe landmarks",
         },
         "explainability": {
+            "nvidia_model_candidates": list(MODEL_CANDIDATES),
+            "provider": "NVIDIA NIM",
         },
         "generator_labels": SUPPORTED_GENERATORS,
     }

src/engines/coherence/engine.py CHANGED Viewed

@@ -23,6 +23,9 @@ _mtcnn = None
 _resnet = None
 _face_mesh = None
 _torch = None
 def _skip_model_loads() -> bool:
@@ -106,7 +109,8 @@ def _build_face_mesh():
 def _load() -> None:
-    global _mtcnn, _resnet, _face_mesh, _load_attempted, _torch
     if _load_attempted:
         return
@@ -123,23 +127,49 @@ def _load() -> None:
         logger.warning("Coherence FaceMesh unavailable: %s", _short_error(exc))
     try:
-        from facenet_pytorch import InceptionResnetV1, MTCNN  # type: ignore
-        _mtcnn = MTCNN(keep_all=False, device="cpu")
-        _resnet = InceptionResnetV1(pretrained="vggface2").eval()
-        try:
-            import torch  # type: ignore
-            _torch = torch
-        except Exception:
-            _torch = None
     except Exception as exc:
         logger.warning(
-            "Coherence embedding model load failed, using heuristic-only mode: %s",
             _short_error(exc),
         )
     logger.info("Coherence model load attempt complete")
@@ -234,14 +264,12 @@ class CoherenceEngine:
         blink = self._blink_anomaly(frames)
         visual_score = float(np.clip(delta * 0.45 + jerk * 0.35 + blink * 0.20, 0.0, 1.0))
-        # Audio lip-sync cross-correlation (LipFD-inspired, paper §III-A)
         audio_anomaly: Optional[float] = None
         timestamp_markers: list[dict] = []
         if video_path is not None:
             audio_anomaly, timestamp_markers = self._audio_lipsync_score(video_path, frames)
         if audio_anomaly is not None:
-            # Weighted: visual 60%, audio 40% (paper weights for Module 1)
             score = float(np.clip(visual_score * 0.60 + audio_anomaly * 0.40, 0.0, 1.0))
             explanation = (
                 f"Embedding variance {delta:.2f}, landmark jerk {jerk:.2f}, "
@@ -275,16 +303,6 @@ class CoherenceEngine:
     ) -> tuple[float, list[dict]]:
         """
         MFCC cross-correlation with lip-aperture motion curve (paper §III-A).
-        Extracts mono 16 kHz audio via ffmpeg, computes MFCC energy envelope,
-        computes per-frame lip-aperture from MediaPipe, resamples both to the
-        same length, and returns the Pearson correlation as an anomaly score.
-        Returns:
-            (sync_anomaly_score, timestamp_markers)
-            sync_anomaly_score: 0 = perfectly in sync, 1 = totally out of sync
-            timestamp_markers: list of {start_s, end_s, correlation} dicts for
-                               segments where correlation < 0.2
         """
         try:
             import librosa  # type: ignore
@@ -301,7 +319,7 @@ class CoherenceEngine:
             cmd = [
                 "ffmpeg", "-i", video_path,
                 "-ac", "1", "-ar", "16000",
-                "-vn",           # no video output
                 "-f", "wav",
                 audio_path,
                 "-y", "-loglevel", "error",
@@ -320,9 +338,8 @@ class CoherenceEngine:
             Path(audio_path).unlink(missing_ok=True)
         if len(y) < sr * 0.5:
-            return 0.35, []  # less than 0.5 s of audio �� inconclusive
-        # Audio energy envelope from MFCC
         hop_length = 512
         try:
             mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13, hop_length=hop_length)
@@ -331,7 +348,6 @@ class CoherenceEngine:
             logger.warning("MFCC computation failed: %s", exc)
             return 0.35, []
-        # Lip-aperture curve from MediaPipe (inner upper lip=13, lower=14)
         if _face_mesh is None:
             return 0.35, []
@@ -351,9 +367,8 @@ class CoherenceEngine:
                 lip_apertures.append(0.0)
         if len(lip_apertures) < 4 or float(np.std(lip_apertures)) < 1e-6:
-            return 0.35, []  # static lip → can't measure sync
-        # Resample lip curve to match audio_curve length
         lip_curve = np.array(lip_apertures, dtype=np.float32)
         target_len = len(audio_curve)
         lip_resampled = np.interp(
@@ -365,18 +380,15 @@ class CoherenceEngine:
         if target_len < 4:
             return 0.35, []
-        # Overall Pearson correlation
         try:
             r_overall, _ = pearsonr(audio_curve, lip_resampled)
         except Exception:
             r_overall = 0.0
-        # Map correlation → anomaly score
-        # Real speech: r typically > 0.3; deepfake: often < 0.1 or negative
         sync_anomaly = float(np.clip((0.3 - float(r_overall)) / 0.5 + 0.35, 0.0, 1.0))
-        # Sliding-window timestamp markers for low-correlation segments
-        hop_s = hop_length / sr  # seconds per MFCC frame
         markers: list[dict] = []
         window = max(10, target_len // 10)
         stride = max(1, window // 2)
@@ -385,6 +397,7 @@ class CoherenceEngine:
             seg_audio = audio_curve[i : i + window]
             seg_lip = lip_resampled[i : i + window]
             try:
                 r_seg, _ = pearsonr(seg_audio, seg_lip)
             except Exception:
                 continue
@@ -398,26 +411,66 @@ class CoherenceEngine:
         return sync_anomaly, markers
     def _embedding_variance(self, frames: list[np.ndarray]) -> float:
-        if _mtcnn is None or _resnet is None or _torch is None:
             return 0.5
-        embeddings: list[np.ndarray] = []
         for frame in frames[::4]:
             try:
-                face = _mtcnn(Image.fromarray(frame))
-                if face is not None:
-                    with _torch.no_grad():
-                        emb = _resnet(face.unsqueeze(0)).detach().cpu().numpy()[0]
-                    embeddings.append(emb)
             except Exception:
                 continue
-        if len(embeddings) < 2:
             return 0.5
         deltas = [
-            float(np.linalg.norm(embeddings[index + 1] - embeddings[index]))
-            for index in range(len(embeddings) - 1)
         ]
         return float(np.clip(np.var(deltas) * 8.0, 0.0, 1.0))

 _resnet = None
 _face_mesh = None
 _torch = None
+_device = "cpu"  # updated to "cuda" in _load() when GPU is available
+_resnet_fallback = None   # torchvision ResNet-18 used when facenet-pytorch unavailable
+_transform_fallback = None
 def _skip_model_loads() -> bool:
 def _load() -> None:
+    global _mtcnn, _resnet, _face_mesh, _load_attempted, _torch, _device
+    global _resnet_fallback, _transform_fallback
     if _load_attempted:
         return
         logger.warning("Coherence FaceMesh unavailable: %s", _short_error(exc))
     try:
+        import torch  # type: ignore
+        _torch = torch
+        _device = "cuda" if torch.cuda.is_available() else "cpu"
+        logger.info("  Coherence device: %s", _device)
+        from facenet_pytorch import InceptionResnetV1, MTCNN  # type: ignore
+        _mtcnn = MTCNN(keep_all=False, device=_device)
+        _resnet = InceptionResnetV1(pretrained="vggface2").eval().to(_device)
+        logger.info("  FaceNet loaded on %s", _device)
     except Exception as exc:
         logger.warning(
+            "Coherence facenet-pytorch unavailable (%s); trying torchvision fallback.",
             _short_error(exc),
         )
+        try:
+            import torch  # type: ignore
+            import torchvision.models as tv_models  # type: ignore
+            import torchvision.transforms as tv_transforms  # type: ignore
+            _torch = torch
+            _device = "cuda" if torch.cuda.is_available() else "cpu"
+            model = tv_models.resnet18(weights=tv_models.ResNet18_Weights.DEFAULT)
+            model.fc = torch.nn.Identity()  # strip classifier → 512-d embedding
+            _resnet_fallback = model.eval().to(_device)
+            _transform_fallback = tv_transforms.Compose([
+                tv_transforms.Resize((224, 224)),
+                tv_transforms.ToTensor(),
+                tv_transforms.Normalize(
+                    mean=[0.485, 0.456, 0.406],
+                    std=[0.229, 0.224, 0.225],
+                ),
+            ])
+            logger.info("  torchvision ResNet-18 fallback loaded on %s", _device)
+        except Exception as exc2:
+            logger.warning(
+                "Coherence embedding fallback also failed, heuristic-only mode: %s",
+                _short_error(exc2),
+            )
     logger.info("Coherence model load attempt complete")
         blink = self._blink_anomaly(frames)
         visual_score = float(np.clip(delta * 0.45 + jerk * 0.35 + blink * 0.20, 0.0, 1.0))
         audio_anomaly: Optional[float] = None
         timestamp_markers: list[dict] = []
         if video_path is not None:
             audio_anomaly, timestamp_markers = self._audio_lipsync_score(video_path, frames)
         if audio_anomaly is not None:
             score = float(np.clip(visual_score * 0.60 + audio_anomaly * 0.40, 0.0, 1.0))
             explanation = (
                 f"Embedding variance {delta:.2f}, landmark jerk {jerk:.2f}, "
     ) -> tuple[float, list[dict]]:
         """
         MFCC cross-correlation with lip-aperture motion curve (paper §III-A).
         """
         try:
             import librosa  # type: ignore
             cmd = [
                 "ffmpeg", "-i", video_path,
                 "-ac", "1", "-ar", "16000",
+                "-vn",
                 "-f", "wav",
                 audio_path,
                 "-y", "-loglevel", "error",
             Path(audio_path).unlink(missing_ok=True)
         if len(y) < sr * 0.5:
+            return 0.35, []
         hop_length = 512
         try:
             mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13, hop_length=hop_length)
             logger.warning("MFCC computation failed: %s", exc)
             return 0.35, []
         if _face_mesh is None:
             return 0.35, []
                 lip_apertures.append(0.0)
         if len(lip_apertures) < 4 or float(np.std(lip_apertures)) < 1e-6:
+            return 0.35, []
         lip_curve = np.array(lip_apertures, dtype=np.float32)
         target_len = len(audio_curve)
         lip_resampled = np.interp(
         if target_len < 4:
             return 0.35, []
         try:
+            from scipy.stats import pearsonr  # type: ignore
             r_overall, _ = pearsonr(audio_curve, lip_resampled)
         except Exception:
             r_overall = 0.0
         sync_anomaly = float(np.clip((0.3 - float(r_overall)) / 0.5 + 0.35, 0.0, 1.0))
+        hop_s = hop_length / sr
         markers: list[dict] = []
         window = max(10, target_len // 10)
         stride = max(1, window // 2)
             seg_audio = audio_curve[i : i + window]
             seg_lip = lip_resampled[i : i + window]
             try:
+                from scipy.stats import pearsonr  # type: ignore
                 r_seg, _ = pearsonr(seg_audio, seg_lip)
             except Exception:
                 continue
         return sync_anomaly, markers
     def _embedding_variance(self, frames: list[np.ndarray]) -> float:
+        if _torch is None:
             return 0.5
+        # --- facenet-pytorch path (preferred) ---
+        if _mtcnn is not None and _resnet is not None:
+            embeddings: list[np.ndarray] = []
+            for frame in frames[::4]:
+                try:
+                    face = _mtcnn(Image.fromarray(frame))
+                    if face is not None:
+                        face_gpu = face.unsqueeze(0).to(_device)
+                        with _torch.no_grad():
+                            with _torch.cuda.amp.autocast(enabled=(_device == "cuda")):
+                                emb = _resnet(face_gpu).detach().float().cpu().numpy()[0]
+                        embeddings.append(emb)
+                except Exception:
+                    continue
+            if len(embeddings) >= 2:
+                deltas = [
+                    float(np.linalg.norm(embeddings[i + 1] - embeddings[i]))
+                    for i in range(len(embeddings) - 1)
+                ]
+                return float(np.clip(np.var(deltas) * 8.0, 0.0, 1.0))
+            return 0.5
+        # --- torchvision ResNet-18 fallback (Python 3.14+, no facenet-pytorch) ---
+        if _resnet_fallback is None or _transform_fallback is None or _face_mesh is None:
+            return 0.5
+        embeddings_fb: list[np.ndarray] = []
         for frame in frames[::4]:
             try:
+                res = _face_mesh.process(frame)
+                if not res.multi_face_landmarks:
+                    continue
+                lm = res.multi_face_landmarks[0].landmark
+                h, w = frame.shape[:2]
+                xs = [l.x * w for l in lm]
+                ys = [l.y * h for l in lm]
+                x1 = max(0, int(min(xs)) - 10)
+                x2 = min(w, int(max(xs)) + 10)
+                y1 = max(0, int(min(ys)) - 10)
+                y2 = min(h, int(max(ys)) + 10)
+                if x2 - x1 < 20 or y2 - y1 < 20:
+                    continue
+                crop = Image.fromarray(frame[y1:y2, x1:x2]).convert("RGB")
+                tensor = _transform_fallback(crop).unsqueeze(0).to(_device)
+                with _torch.no_grad():
+                    with _torch.cuda.amp.autocast(enabled=(_device == "cuda")):
+                        emb = _resnet_fallback(tensor).detach().float().cpu().numpy()[0]
+                embeddings_fb.append(emb)
             except Exception:
                 continue
+        if len(embeddings_fb) < 2:
             return 0.5
         deltas = [
+            float(np.linalg.norm(embeddings_fb[i + 1] - embeddings_fb[i]))
+            for i in range(len(embeddings_fb) - 1)
         ]
         return float(np.clip(np.var(deltas) * 8.0, 0.0, 1.0))

src/engines/fingerprint/engine.py CHANGED Viewed

@@ -22,6 +22,10 @@ from src.types import EngineResult
 logger = logging.getLogger(__name__)
 CACHE = os.environ.get("MODEL_CACHE_DIR", "/tmp/models")
 DETECTOR_CANDIDATES = [
     "Organika/sdxl-detector",
     "haywoodsloan/ai-image-detector-deploy",
@@ -70,8 +74,6 @@ _clip_model: Optional[CLIPModel] = None
 _clip_processor: Optional[CLIPProcessor] = None
 _loaded = False
-# Thread-local storage: each request thread stores its last CLIP embedding here
-# so the novelty detector can consume it without a second forward pass.
 _thread_local = threading.local()
@@ -92,16 +94,19 @@ def _short_error(exc: Exception, *, limit: int = 300) -> str:
 def _build_detector(model_id: str) -> Any:
     hf_pipeline = _get_pipeline()
-    # Some transformer builds reject cache_dir in pipeline init.
-    attempts = ({"cache_dir": CACHE}, {})
     last_exc: Exception | None = None
     for kwargs in attempts:
         try:
             return hf_pipeline("image-classification", model=model_id, **kwargs)
         except Exception as exc:
             last_exc = exc
     if last_exc is not None:
         raise last_exc
     raise RuntimeError(f"Unable to load fingerprint detector pipeline for {model_id}")
@@ -112,7 +117,7 @@ def _load() -> None:
     if _loaded:
         return
-    logger.info("Fingerprint engine: loading models...")
     for model_id in DETECTOR_CANDIDATES:
         try:
@@ -126,24 +131,28 @@ def _load() -> None:
         logger.error("Fingerprint engine: no detectors loaded; using neutral fallback score.")
     try:
         _clip_model = CLIPModel.from_pretrained(
             "openai/clip-vit-large-patch14",
             cache_dir=CACHE,
-        )
         _clip_processor = CLIPProcessor.from_pretrained(
             "openai/clip-vit-large-patch14",
             cache_dir=CACHE,
         )
         _clip_model.eval()
-        logger.info("  CLIP loaded for generator attribution")
     except Exception as exc:
         logger.warning("  CLIP unavailable: %s", _short_error(exc))
     _loaded = True
     logger.info(
-        "Fingerprint engine ready: %s detectors, CLIP=%s",
         len(_detectors),
         "ok" if _clip_model else "missing",
     )
@@ -183,9 +192,6 @@ class FingerprintEngine:
         if image.mode != "RGB":
             image = image.convert("RGB")
-        if not _detectors:
-            logger.warning("No fingerprint detectors loaded; using neutral fallback score.")
         detector_weights = [0.4, 0.3, 0.2, 0.1]
         total_w = 0.0
         weighted_fake = 0.0
@@ -203,7 +209,6 @@ class FingerprintEngine:
         ensemble_score = (weighted_fake / total_w) if total_w > 0 else 0.5
-        # DCT frequency band analysis (paper §III-B / Kim et al.)
         dct_score = self._dct_frequency_score(image)
         fake_score = float(np.clip(ensemble_score * 0.85 + dct_score * 0.15, 0.0, 1.0))
@@ -236,17 +241,19 @@ class FingerprintEngine:
                 truncation=True,
                 max_length=77,
             )
             with torch.no_grad():
-                outputs = _clip_model(**inputs)
-                logits = outputs.logits_per_image[0]
-                # Store image embedding for novelty detection
-                image_embeds = outputs.image_embeds.detach().cpu().numpy()[0]
-                _thread_local.last_clip_embedding = image_embeds
             probs = logits.softmax(dim=0).cpu().numpy()
             max_prob = float(np.max(probs))
-            # Low confidence attribution → unknown generator (9 classes: chance=0.11, threshold=2.9×)
             if max_prob < 0.32:
                 generator = "unknown_generative"
             else:
@@ -262,24 +269,70 @@ class FingerprintEngine:
             _thread_local.last_clip_embedding = None
             return "unknown_generative" if fake_score > 0.5 else "real"
-    def _dct_frequency_score(self, image: Image.Image) -> float:
         """
-        DCT frequency band analysis (paper §III-B).
-        High-frequency energy ratio is an anomaly signal: real photos follow
-        a predictable DCT energy roll-off; AI generators often deviate.
-        Returns float [0, 1] where higher = more anomalous.
         """
         try:
             from scipy.fft import dctn  # type: ignore
             gray = np.array(image.convert("L"), dtype=np.float32)
             h, w = gray.shape
-            # Align to 8×8 block boundary (JPEG-DCT standard)
             bh, bw = h - h % 8, w - w % 8
             if bh < 8 or bw < 8:
                 return 0.3
             crop = gray[:bh, :bw]
-            # Reshape into (n_blocks_h, n_blocks_w, 8, 8) then DCT each 8×8 block
             blocks = crop.reshape(bh // 8, 8, bw // 8, 8).transpose(0, 2, 1, 3)
             n_bh, n_bw = blocks.shape[:2]
@@ -295,9 +348,7 @@ class FingerprintEngine:
                 return 0.3
             ac_ratio = 1.0 - (dc_energy_total / all_energy_total)
-            # Real photos: ac_ratio ≈ 0.80–0.90; AI images can deviate significantly
-            score = float(np.clip(abs(ac_ratio - 0.85) / 0.15, 0.0, 1.0))
-            return score
         except Exception as exc:
             logger.warning("DCT frequency score error: %s", _short_error(exc))
             return 0.3
@@ -317,11 +368,33 @@ class FingerprintEngine:
                 processing_time_ms=0.0,
             )
         keyframes = frames[::8] or [frames[0]]
-        results = [self.run(Image.fromarray(frame)) for frame in keyframes]
-        avg_conf = float(np.mean([result.confidence for result in results]))
-        generators = [result.attributed_generator for result in results if result.attributed_generator]
         top_gen = max(set(generators), key=generators.count) if generators else "unknown_generative"
         return EngineResult(

 logger = logging.getLogger(__name__)
 CACHE = os.environ.get("MODEL_CACHE_DIR", "/tmp/models")
+# GPU device selection — A100 / any CUDA GPU if available, else CPU
+_DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+_PIPELINE_DEVICE = 0 if _DEVICE == "cuda" else -1  # HF pipeline convention
 DETECTOR_CANDIDATES = [
     "Organika/sdxl-detector",
     "haywoodsloan/ai-image-detector-deploy",
 _clip_processor: Optional[CLIPProcessor] = None
 _loaded = False
 _thread_local = threading.local()
 def _build_detector(model_id: str) -> Any:
     hf_pipeline = _get_pipeline()
+    # Try GPU first, fall back to CPU-only variants
+    attempts: tuple[dict, ...] = (
+        {"cache_dir": CACHE, "device": _PIPELINE_DEVICE},
+        {"device": _PIPELINE_DEVICE},
+        {"cache_dir": CACHE},
+        {},
+    )
     last_exc: Exception | None = None
     for kwargs in attempts:
         try:
             return hf_pipeline("image-classification", model=model_id, **kwargs)
         except Exception as exc:
             last_exc = exc
     if last_exc is not None:
         raise last_exc
     raise RuntimeError(f"Unable to load fingerprint detector pipeline for {model_id}")
     if _loaded:
         return
+    logger.info("Fingerprint engine: loading models on device=%s ...", _DEVICE)
     for model_id in DETECTOR_CANDIDATES:
         try:
         logger.error("Fingerprint engine: no detectors loaded; using neutral fallback score.")
     try:
+        # Load CLIP in FP16 on CUDA for ~2× speed + half memory on A100
+        dtype = torch.float16 if _DEVICE == "cuda" else torch.float32
         _clip_model = CLIPModel.from_pretrained(
             "openai/clip-vit-large-patch14",
             cache_dir=CACHE,
+            torch_dtype=dtype,
+        ).to(_DEVICE)
         _clip_processor = CLIPProcessor.from_pretrained(
             "openai/clip-vit-large-patch14",
             cache_dir=CACHE,
         )
         _clip_model.eval()
+        logger.info("  CLIP loaded on %s (dtype=%s)", _DEVICE, dtype)
     except Exception as exc:
         logger.warning("  CLIP unavailable: %s", _short_error(exc))
     _loaded = True
     logger.info(
+        "Fingerprint engine ready: %s detectors, CLIP=%s, device=%s",
         len(_detectors),
         "ok" if _clip_model else "missing",
+        _DEVICE,
     )
         if image.mode != "RGB":
             image = image.convert("RGB")
         detector_weights = [0.4, 0.3, 0.2, 0.1]
         total_w = 0.0
         weighted_fake = 0.0
         ensemble_score = (weighted_fake / total_w) if total_w > 0 else 0.5
         dct_score = self._dct_frequency_score(image)
         fake_score = float(np.clip(ensemble_score * 0.85 + dct_score * 0.15, 0.0, 1.0))
                 truncation=True,
                 max_length=77,
             )
+            # Move all tensors to GPU
+            inputs = {k: v.to(_DEVICE) for k, v in inputs.items()}
             with torch.no_grad():
+                with torch.cuda.amp.autocast(enabled=(_DEVICE == "cuda")):
+                    outputs = _clip_model(**inputs)
+                    logits = outputs.logits_per_image[0].float()
+                    image_embeds = outputs.image_embeds.detach().float().cpu().numpy()[0]
+            _thread_local.last_clip_embedding = image_embeds
             probs = logits.softmax(dim=0).cpu().numpy()
             max_prob = float(np.max(probs))
             if max_prob < 0.32:
                 generator = "unknown_generative"
             else:
             _thread_local.last_clip_embedding = None
             return "unknown_generative" if fake_score > 0.5 else "real"
+    def _batch_clip_attribution(
+        self, images: list[Image.Image], fake_scores: list[float]
+    ) -> list[str]:
         """
+        Single batched CLIP forward pass for all keyframes — far faster than
+        calling _attribute_generator() once per frame on GPU.
         """
+        if _clip_model is None or _clip_processor is None or not images:
+            return [
+                "unknown_generative" if s > 0.5 else "real" for s in fake_scores
+            ]
+        try:
+            texts = list(GENERATOR_PROMPTS.values())
+            inputs = _clip_processor(
+                text=texts,
+                images=images,
+                return_tensors="pt",
+                padding=True,
+                truncation=True,
+                max_length=77,
+            )
+            inputs = {k: v.to(_DEVICE) for k, v in inputs.items()}
+            with torch.no_grad():
+                with torch.cuda.amp.autocast(enabled=(_DEVICE == "cuda")):
+                    # logits_per_image: (N_images, N_texts)
+                    logits = _clip_model(**inputs).logits_per_image.float()
+            probs_batch = logits.softmax(dim=-1).cpu().numpy()  # (N, 9)
+            keys = list(GENERATOR_PROMPTS.keys())
+            results: list[str] = []
+            for i, fake_score in enumerate(fake_scores):
+                probs = probs_batch[i]
+                max_prob = float(np.max(probs))
+                if max_prob < 0.32:
+                    gen = "unknown_generative"
+                else:
+                    gen = keys[int(np.argmax(probs))]
+                if fake_score > 0.65 and gen == "real":
+                    gen = "unknown_generative"
+                if fake_score < 0.35 and gen != "real":
+                    gen = "real"
+                results.append(gen)
+            return results
+        except Exception as exc:
+            logger.warning("Batch CLIP attribution error: %s", _short_error(exc))
+            return [
+                "unknown_generative" if s > 0.5 else "real" for s in fake_scores
+            ]
+    def _dct_frequency_score(self, image: Image.Image) -> float:
+        """DCT frequency band analysis (paper §III-B). Runs on CPU (block-level)."""
         try:
             from scipy.fft import dctn  # type: ignore
             gray = np.array(image.convert("L"), dtype=np.float32)
             h, w = gray.shape
             bh, bw = h - h % 8, w - w % 8
             if bh < 8 or bw < 8:
                 return 0.3
             crop = gray[:bh, :bw]
             blocks = crop.reshape(bh // 8, 8, bw // 8, 8).transpose(0, 2, 1, 3)
             n_bh, n_bw = blocks.shape[:2]
                 return 0.3
             ac_ratio = 1.0 - (dc_energy_total / all_energy_total)
+            return float(np.clip(abs(ac_ratio - 0.85) / 0.15, 0.0, 1.0))
         except Exception as exc:
             logger.warning("DCT frequency score error: %s", _short_error(exc))
             return 0.3
                 processing_time_ms=0.0,
             )
+        self._ensure()
         keyframes = frames[::8] or [frames[0]]
+        keyframes_pil = [
+            Image.fromarray(f).convert("RGB") for f in keyframes
+        ]
+        # Batch detector scores (HF pipeline accepts a list)
+        detector_weights = [0.4, 0.3, 0.2, 0.1]
+        frame_scores: list[float] = []
+        for img in keyframes_pil:
+            total_w = 0.0
+            weighted_fake = 0.0
+            for index, (model_id, det) in enumerate(_detectors):
+                try:
+                    preds = det(img)
+                    score = _fake_score_from_preds(preds)
+                    weight = detector_weights[index] if index < len(detector_weights) else 0.1
+                    weighted_fake += score * weight
+                    total_w += weight
+                except Exception:
+                    pass
+            frame_scores.append((weighted_fake / total_w) if total_w > 0 else 0.5)
+        # Single batched CLIP pass for all keyframes
+        generators = self._batch_clip_attribution(keyframes_pil, frame_scores)
+        avg_conf = float(np.mean(frame_scores))
         top_gen = max(set(generators), key=generators.count) if generators else "unknown_generative"
         return EngineResult(

src/engines/sstgnn/engine.py CHANGED Viewed

@@ -9,6 +9,7 @@ from pathlib import Path
 from typing import Any
 import numpy as np
 from PIL import Image
 from src.types import EngineResult
@@ -16,6 +17,10 @@ from src.types import EngineResult
 logger = logging.getLogger(__name__)
 CACHE = os.environ.get("MODEL_CACHE_DIR", "/tmp/models")
 _lock = threading.Lock()
 _load_attempted = False
 _detectors: list[Any] = []
@@ -66,7 +71,13 @@ def _short_error(exc: Exception, *, limit: int = 300) -> str:
 def _build_image_classifier(model_id: str) -> Any:
     pipeline = _get_pipeline()
-    attempts = ({"cache_dir": CACHE}, {})
     last_exc: Exception | None = None
     for kwargs in attempts:
         try:
@@ -175,7 +186,7 @@ def _load() -> None:
         logger.info("Skipping SSTGNN model load (GENAI_SKIP_MODEL_LOAD=1)")
         return
-    logger.info("Loading SSTGNN models...")
     try:
         configured_models = [
@@ -214,7 +225,7 @@ def _load() -> None:
     except Exception:
         _delaunay = None
-    logger.info("SSTGNN model load attempt complete")
 class SSTGNNEngine:
@@ -266,6 +277,34 @@ class SSTGNNEngine:
             return float(np.clip(sum(weighted_scores) / weight_total, 0.0, 1.0))
         return 0.5
     def _geometry_score(self, frame: np.ndarray) -> float:
         if _mesh is None:
             return 0.3
@@ -306,13 +345,7 @@ class SSTGNNEngine:
     def _temporal_fft_score(self, frames: list[np.ndarray]) -> float:
         """
         Pixel-wise 1D FFT over the time axis (paper §III-C / Kim et al. [7]).
-        For each pixel position in a 32×32 downsampled grid, the 1D FFT is
-        computed across T frame samples. Real video concentrates energy in the
-        DC component (slow, smooth motion). Deepfakes often exhibit elevated
-        high-frequency temporal components due to frame-level inconsistencies.
-        Returns float [0, 1] where higher = more anomalous.
         """
         try:
             import cv2  # type: ignore
@@ -320,13 +353,11 @@ class SSTGNNEngine:
             if len(frames) < 8:
                 return 0.3
-            # Sample up to 32 frames evenly
             step = max(1, len(frames) // 32)
             sampled = frames[::step][:32]
             if len(sampled) < 4:
                 return 0.3
-            # Downsample each frame to 32×32 grayscale float32
             gray_stack = np.array(
                 [
                     cv2.resize(
@@ -339,18 +370,23 @@ class SSTGNNEngine:
                 ]
             )  # shape: (T, 32, 32)
-            # 1D real FFT along time axis
-            fft_result = np.fft.rfft(gray_stack, axis=0)  # (T//2+1, 32, 32)
-            power = np.abs(fft_result) ** 2                # power spectrum
-            dc_power = power[0]                                    # (32, 32)
-            total_power = np.sum(power, axis=0) + 1e-9            # (32, 32)
-            hf_ratio = 1.0 - (dc_power / total_power)             # per-pixel HF ratio
             mean_hf = float(np.mean(hf_ratio))
-            # Real video: mean_hf ≈ 0.20–0.40 (most energy in slow motion).
-            # Deepfakes deviate in either direction (flickering >0.55 or
-            # unnaturally smooth <0.10). Centre of normal range = 0.30.
             score = float(np.clip(abs(mean_hf - 0.30) / 0.25, 0.0, 1.0))
             return score
@@ -373,13 +409,23 @@ class SSTGNNEngine:
             )
         sample = frames[::6] or [frames[0]]
-        results = [self.run(Image.fromarray(frame)) for frame in sample]
-        cnn_geo_avg = float(np.mean([r.confidence for r in results]))
-        # Pixel-wise temporal FFT (paper §III-C / Kim et al. [7])
         fft_score = self._temporal_fft_score(frames)
-        # Final: CNN+geometry 80%, temporal FFT 20%
         avg = float(np.clip(cnn_geo_avg * 0.80 + fft_score * 0.20, 0.0, 1.0))
         return EngineResult(

 from typing import Any
 import numpy as np
+import torch
 from PIL import Image
 from src.types import EngineResult
 logger = logging.getLogger(__name__)
 CACHE = os.environ.get("MODEL_CACHE_DIR", "/tmp/models")
+# GPU device selection
+_DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+_PIPELINE_DEVICE = 0 if _DEVICE == "cuda" else -1  # HF pipeline convention
 _lock = threading.Lock()
 _load_attempted = False
 _detectors: list[Any] = []
 def _build_image_classifier(model_id: str) -> Any:
     pipeline = _get_pipeline()
+    # Try with GPU first, fall back gracefully
+    attempts: tuple[dict, ...] = (
+        {"cache_dir": CACHE, "device": _PIPELINE_DEVICE},
+        {"device": _PIPELINE_DEVICE},
+        {"cache_dir": CACHE},
+        {},
+    )
     last_exc: Exception | None = None
     for kwargs in attempts:
         try:
         logger.info("Skipping SSTGNN model load (GENAI_SKIP_MODEL_LOAD=1)")
         return
+    logger.info("Loading SSTGNN models on device=%s ...", _DEVICE)
     try:
         configured_models = [
     except Exception:
         _delaunay = None
+    logger.info("SSTGNN model load attempt complete (device=%s)", _DEVICE)
 class SSTGNNEngine:
             return float(np.clip(sum(weighted_scores) / weight_total, 0.0, 1.0))
         return 0.5
+    def _batch_cnn_scores(self, images: list[Image.Image]) -> list[float]:
+        """
+        Pass a batch of images through each detector at once — HF pipeline
+        accepts a list and handles batching internally on GPU.
+        """
+        if not _detectors or not images:
+            return [0.5] * len(images)
+        n = len(images)
+        weighted_totals = [0.0] * n
+        weight_sum = 0.0
+        for index, detector in enumerate(_detectors):
+            weight = _detector_weights[index] if index < len(_detector_weights) else 1.0
+            try:
+                # Pass the full list — GPU pipeline processes all frames in one batch
+                batch_preds = detector(images)
+                for i, preds in enumerate(batch_preds):
+                    score = _fake_prob_from_preds(preds if isinstance(preds, list) else [preds])
+                    weighted_totals[i] += score * max(weight, 0.0)
+                weight_sum += max(weight, 0.0)
+            except Exception as exc:
+                logger.warning("SSTGNN batch detector error: %s", _short_error(exc))
+        if weight_sum > 0.0:
+            return [float(np.clip(w / weight_sum, 0.0, 1.0)) for w in weighted_totals]
+        return [0.5] * n
     def _geometry_score(self, frame: np.ndarray) -> float:
         if _mesh is None:
             return 0.3
     def _temporal_fft_score(self, frames: list[np.ndarray]) -> float:
         """
         Pixel-wise 1D FFT over the time axis (paper §III-C / Kim et al. [7]).
+        Uses torch.fft on GPU for ~10× speedup over numpy on A100.
         """
         try:
             import cv2  # type: ignore
             if len(frames) < 8:
                 return 0.3
             step = max(1, len(frames) // 32)
             sampled = frames[::step][:32]
             if len(sampled) < 4:
                 return 0.3
             gray_stack = np.array(
                 [
                     cv2.resize(
                 ]
             )  # shape: (T, 32, 32)
+            if _DEVICE == "cuda":
+                # GPU path: torch.fft on A100 is dramatically faster
+                gray_tensor = torch.from_numpy(gray_stack).to(_DEVICE)  # (T, 32, 32)
+                fft_result = torch.fft.rfft(gray_tensor, dim=0)          # (T//2+1, 32, 32)
+                power = torch.abs(fft_result) ** 2
+                dc_power = power[0].cpu().numpy()
+                total_power = (torch.sum(power, dim=0) + 1e-9).cpu().numpy()
+            else:
+                # CPU fallback
+                fft_result = np.fft.rfft(gray_stack, axis=0)
+                power = np.abs(fft_result) ** 2
+                dc_power = power[0]
+                total_power = np.sum(power, axis=0) + 1e-9
+            hf_ratio = 1.0 - (dc_power / total_power)
             mean_hf = float(np.mean(hf_ratio))
             score = float(np.clip(abs(mean_hf - 0.30) / 0.25, 0.0, 1.0))
             return score
             )
         sample = frames[::6] or [frames[0]]
+        sample_pil = [Image.fromarray(f) for f in sample]
+        # Batched CNN scoring — single pipeline call per detector for all frames
+        cnn_scores = self._batch_cnn_scores(sample_pil)
+        # Geometry scores still per-frame (MediaPipe is CPU-only)
+        geo_scores = [self._geometry_score(np.array(img)) for img in sample_pil]
+        per_frame = [
+            float(np.clip(c * 0.70 + g * 0.30, 0.0, 1.0))
+            for c, g in zip(cnn_scores, geo_scores)
+        ]
+        cnn_geo_avg = float(np.mean(per_frame))
+        # Temporal FFT on GPU
         fft_score = self._temporal_fft_score(frames)
         avg = float(np.clip(cnn_geo_avg * 0.80 + fft_score * 0.20, 0.0, 1.0))
         return EngineResult(

src/explainability/explainer.py CHANGED Viewed

@@ -2,21 +2,12 @@ from __future__ import annotations
 import logging
 import os
-import queue
-import threading
 from src.types import DetectionResponse, EngineResult
 logger = logging.getLogger(__name__)
-try:
-    from google import genai as genai_new  # type: ignore
-except Exception:
-    genai_new = None
-genai_legacy = None
 SYSTEM_INSTRUCTION = (
     "You are a deepfake forensics analyst writing reports for security professionals. "
     "Given detection engine outputs, write exactly 2-3 sentences in plain English "
@@ -27,229 +18,88 @@ SYSTEM_INSTRUCTION = (
 )
 DEFAULT_MODEL_CANDIDATES = (
-    # Source: https://ai.google.dev/models/gemini (checked March 2026).
-    # Prefer current Gemini 3 model codes first, then compatibility fallbacks.
-    "gemini-3-pro-preview",
-    "gemini-3-flash-preview",
-    "gemini-3-pro-image-preview",
-    "gemini-3.1-pro-preview",
-    "gemini-3.1-pro-preview-customtools",
-    "gemini-3.1-flash-lite-preview",
-    "gemini-2.5-pro",
-    "gemini-2.5-flash",
-    "gemini-2.5-flash-lite",
 )
 _configured_candidates = [
     value.strip()
-    for value in os.environ.get("GEMINI_MODEL_CANDIDATES", "").split(",")
     if value.strip()
 ]
-MODEL_CANDIDATES = tuple(_configured_candidates) if _configured_candidates else DEFAULT_MODEL_CANDIDATES
-REQUEST_TIMEOUT_S = float(os.environ.get("GEMINI_REQUEST_TIMEOUT_S", "10"))
-MAX_MODEL_ATTEMPTS = max(1, int(os.environ.get("GEMINI_MAX_MODEL_ATTEMPTS", "3")))
-ENABLE_LEGACY_MODEL_DISCOVERY = os.environ.get("GEMINI_DISCOVER_MODELS", "").strip().lower() in {
-    "1",
-    "true",
-    "yes",
-    "on",
-}
-_new_client = None
-_legacy_model = None
-_legacy_model_name = None
-_legacy_candidates = None
 def _get_api_key() -> str:
-    return os.environ.get("GEMINI_API_KEY", "").strip()
-def _run_with_timeout(func, timeout_s: float):
-    result_q: queue.Queue[tuple[bool, object]] = queue.Queue(maxsize=1)
-    def _runner() -> None:
-        try:
-            result_q.put((True, func()))
-        except Exception as exc:  # pragma: no cover - passthrough
-            result_q.put((False, exc))
-    thread = threading.Thread(target=_runner, daemon=True)
-    thread.start()
-    try:
-        ok, payload = result_q.get(timeout=timeout_s)
-    except queue.Empty as exc:
-        raise TimeoutError(f"Gemini request timed out after {timeout_s:.1f}s") from exc
-    if ok:
-        return payload
-    raise payload  # type: ignore[misc]
-def _ensure_new_client():
-    global _new_client
-    if _new_client is not None:
-        return _new_client
-    if genai_new is None:
-        return None
     api_key = _get_api_key()
     if not api_key:
-        return None
     try:
-        _new_client = genai_new.Client(api_key=api_key)
-        return _new_client
     except Exception as exc:
-        logger.warning("Failed to init google.genai client: %s", exc)
-        return None
-def _generate_with_new_sdk(prompt: str) -> str:
-    client = _ensure_new_client()
-    if client is None:
-        raise RuntimeError("google.genai client unavailable")
-    full_prompt = f"{SYSTEM_INSTRUCTION}\n\n{prompt}"
     last_error: Exception | None = None
-    for model_name in MODEL_CANDIDATES:
-        try:
-            response = _run_with_timeout(
-                lambda: client.models.generate_content(
-                    model=model_name,
-                    contents=full_prompt,
-                ),
-                REQUEST_TIMEOUT_S,
-            )
-            text = getattr(response, "text", None)
-            if text and str(text).strip():
-                logger.info("Gemini explain model selected (new SDK): %s", model_name)
-                return str(text).strip()
-        except Exception as exc:
-            last_error = exc
-            logger.debug("Gemini model %s failed on new SDK: %s", model_name, exc)
-    if last_error:
-        raise last_error
-    raise RuntimeError("No Gemini model succeeded via new SDK")
-def _ensure_legacy_configured() -> bool:
-    global genai_legacy
-    if genai_legacy is None:
-        try:
-            import google.generativeai as _legacy  # type: ignore
-            genai_legacy = _legacy
-        except Exception:
-            return False
-    if genai_legacy is None:
-        return False
-    api_key = _get_api_key()
-    if not api_key:
-        return False
-    try:
-        genai_legacy.configure(api_key=api_key)
-        return True
-    except Exception as exc:
-        logger.warning("Failed to configure legacy Gemini SDK: %s", exc)
-        return False
-def _legacy_model_candidates() -> tuple[str, ...]:
-    global _legacy_candidates
-    if _legacy_candidates is not None:
-        return _legacy_candidates
-    ordered = list(MODEL_CANDIDATES)
-    if not ENABLE_LEGACY_MODEL_DISCOVERY:
-        _legacy_candidates = tuple(ordered)
-        return _legacy_candidates
-    if genai_legacy is None:
-        _legacy_candidates = tuple(ordered)
-        return _legacy_candidates
-    try:
-        discovered: list[str] = []
-        for model in genai_legacy.list_models(request_options={"timeout": REQUEST_TIMEOUT_S}):
-            methods = set(getattr(model, "supported_generation_methods", []) or [])
-            if "generateContent" not in methods:
-                continue
-            name = str(getattr(model, "name", "")).strip()
-            if not name:
-                continue
-            short = name.split("/", 1)[-1]
-            discovered.append(short)
-        if discovered:
-            preferred = [name for name in ordered if name in discovered]
-            remainder = [name for name in discovered if name not in preferred]
-            _legacy_candidates = tuple(preferred + remainder)
-        else:
-            _legacy_candidates = tuple(ordered)
-    except Exception as exc:
-        logger.warning("Could not list Gemini models from legacy SDK: %s", exc)
-        _legacy_candidates = tuple(ordered)
-    return _legacy_candidates
-def _generate_with_legacy_sdk(prompt: str) -> str:
-    global _legacy_model, _legacy_model_name
-    if not _ensure_legacy_configured():
-        raise RuntimeError("legacy Gemini SDK unavailable")
-    if _legacy_model is not None:
-        try:
-            response = _run_with_timeout(
-                lambda: _legacy_model.generate_content(
-                    prompt,
-                    request_options={"timeout": REQUEST_TIMEOUT_S},
-                ),
-                REQUEST_TIMEOUT_S + 1.0,
-            )
-            text = (getattr(response, "text", None) or "").strip()
-            if text:
-                return text
-        except Exception as exc:
-            logger.warning("Cached Gemini model %s failed: %s", _legacy_model_name, exc)
-            _legacy_model = None
-            _legacy_model_name = None
-    last_error: Exception | None = None
-    for model_name in _legacy_model_candidates()[:MAX_MODEL_ATTEMPTS]:
         try:
-            candidate = genai_legacy.GenerativeModel(
-                model_name=model_name,
-                system_instruction=SYSTEM_INSTRUCTION,
-            )
-            response = _run_with_timeout(
-                lambda: candidate.generate_content(
-                    prompt,
-                    request_options={"timeout": REQUEST_TIMEOUT_S},
-                ),
-                REQUEST_TIMEOUT_S + 1.0,
             )
-            text = (getattr(response, "text", None) or "").strip()
-            if text:
-                _legacy_model = candidate
-                _legacy_model_name = model_name
-                logger.info("Gemini explain model selected (legacy SDK): %s", model_name)
-                return text
         except Exception as exc:
             last_error = exc
-            logger.debug("Gemini model %s failed on legacy SDK: %s", model_name, exc)
-    if last_error:
         raise last_error
-    raise RuntimeError("No Gemini model succeeded via legacy SDK")
 def explain(
@@ -271,12 +121,9 @@ def explain(
     )
     try:
-        if genai_new is not None:
-            return _generate_with_new_sdk(prompt)
-        return _generate_with_legacy_sdk(prompt)
     except Exception as exc:
-        logger.error("Gemini explain failed: %s", exc)
         top = engine_results[0] if engine_results else None
         primary = f"Primary signal came from the {top.engine} engine." if top else ""
         return (

 import logging
 import os
+from typing import Any
 from src.types import DetectionResponse, EngineResult
 logger = logging.getLogger(__name__)
 SYSTEM_INSTRUCTION = (
     "You are a deepfake forensics analyst writing reports for security professionals. "
     "Given detection engine outputs, write exactly 2-3 sentences in plain English "
 )
 DEFAULT_MODEL_CANDIDATES = (
+    "meta/llama-3.1-8b-instruct",
 )
 _configured_candidates = [
     value.strip()
+    for value in os.environ.get("NVIDIA_MODEL_CANDIDATES", "").split(",")
     if value.strip()
 ]
+MODEL_CANDIDATES = (
+    tuple(_configured_candidates)
+    if _configured_candidates
+    else DEFAULT_MODEL_CANDIDATES
+)
+REQUEST_TIMEOUT_S = float(os.environ.get("NVIDIA_REQUEST_TIMEOUT_S", "20"))
+MAX_MODEL_ATTEMPTS = max(1, int(os.environ.get("NVIDIA_MAX_MODEL_ATTEMPTS", "3")))
+TEMPERATURE = float(os.environ.get("NVIDIA_EXPLAIN_TEMPERATURE", "0.3"))
+TOP_P = float(os.environ.get("NVIDIA_EXPLAIN_TOP_P", "0.95"))
+MAX_TOKENS = int(os.environ.get("NVIDIA_EXPLAIN_MAX_TOKENS", "300"))
+BASE_URL = os.environ.get("NVIDIA_BASE_URL", "https://integrate.api.nvidia.com/v1").strip()
+_client: Any | None = None
 def _get_api_key() -> str:
+    return (
+        os.environ.get("NVIDIA_API_KEY", "").strip()
+        or os.environ.get("OPENAI_API_KEY", "").strip()
+    )
+def _get_client():
+    global _client
+    if _client is not None:
+        return _client
     api_key = _get_api_key()
     if not api_key:
+        raise RuntimeError("NVIDIA_API_KEY is not configured")
     try:
+        from openai import OpenAI
     except Exception as exc:
+        raise RuntimeError("openai package is not installed") from exc
+    _client = OpenAI(
+        base_url=BASE_URL,
+        api_key=api_key,
+        timeout=REQUEST_TIMEOUT_S,
+        max_retries=1,
+    )
+    return _client
+def _generate(prompt: str) -> str:
+    client = _get_client()
     last_error: Exception | None = None
+    for model_name in MODEL_CANDIDATES[:MAX_MODEL_ATTEMPTS]:
         try:
+            response = client.chat.completions.create(
+                model=model_name,
+                messages=[
+                    {"role": "system", "content": SYSTEM_INSTRUCTION},
+                    {"role": "user", "content": prompt},
+                ],
+                temperature=TEMPERATURE,
+                top_p=TOP_P,
+                max_tokens=MAX_TOKENS,
+                stream=False,
             )
+            content = response.choices[0].message.content
+            if content and content.strip():
+                logger.info("NVIDIA explain model selected: %s", model_name)
+                return content.strip()
         except Exception as exc:
             last_error = exc
+            logger.debug("NVIDIA explain model %s failed: %s", model_name, exc)
+    if last_error is not None:
         raise last_error
+    raise RuntimeError("No NVIDIA model candidates succeeded")
 def explain(
     )
     try:
+        return _generate(prompt)
     except Exception as exc:
+        logger.error("NVIDIA explain failed: %s", exc)
         top = engine_results[0] if engine_results else None
         primary = f"Primary signal came from the {top.engine} engine." if top else ""
         return (

src/fusion/fuser.py CHANGED Viewed

@@ -1,36 +1,31 @@
 """
 src/fusion/fuser.py — Multi-engine evidence fusion.
-Implements Dempster-Shafer (DS) evidence theory combination of the three
-detection engine outputs (paper §III-E / Module 5).
-DS replaces the previous simple weighted average. Each engine produces a
-Basic Probability Assignment (BPA) over {FAKE, REAL, Θ} where Θ is the
-set of all hypotheses (total ignorance). DS combination normalises away
-the conflict between contradictory masses, yielding a combined BPA that
-reflects consensus while respecting uncertainty.
-The final confidence is derived via the pignistic probability transform
-(Smets), which distributes the ignorance mass equally between FAKE and REAL.
 """
 from __future__ import annotations
 import numpy as np
 from src.types import DetectionResponse, EngineResult
-# Engine reliability weights used to build each engine's BPA.
-# Higher weight → engine commits more mass to its verdict, less to Θ.
-ENGINE_RELIABILITY: dict[str, float] = {
-    "fingerprint": 0.70,
-    "coherence":   0.65,
-    "sstgnn":      0.60,
-}
-ENGINE_RELIABILITY_VIDEO: dict[str, float] = {
-    "fingerprint": 0.55,
-    "coherence":   0.75,
-    "sstgnn":      0.65,
-}
 # Attribution priority: which engine's generator label is most trusted
 ATTRIBUTION_PRIORITY: dict[str, int] = {
@@ -39,8 +34,63 @@ ATTRIBUTION_PRIORITY: dict[str, int] = {
     "coherence":   3,
 }
-# Type alias for a Basic Probability Assignment over {FAKE, REAL, Θ}
-_BPA = dict[str, float]
 def _normalize_generator(value: str | None) -> str:
@@ -49,99 +99,33 @@ def _normalize_generator(value: str | None) -> str:
     return str(value).strip().lower().replace(" ", "_")
-def _engine_to_bpa(result: EngineResult, is_video: bool = False) -> _BPA:
-    """
-    Convert an EngineResult into a Basic Probability Assignment.
-    The engine reliability weight (w) determines how much mass is committed
-    to the engine's verdict vs. left as ignorance (Θ).
-    BPA structure:
-        m({FAKE}) + m({REAL}) + m(Θ) = 1.0
-    """
-    weights = ENGINE_RELIABILITY_VIDEO if is_video else ENGINE_RELIABILITY
-    w = weights.get(result.engine, 0.50)
-    c = float(result.confidence)
-    if result.verdict == "UNKNOWN":
-        return {"FAKE": 0.0, "REAL": 0.0, "Θ": 1.0}
-    if result.verdict == "FAKE":
-        return {
-            "FAKE": c * w,
-            "REAL": (1.0 - c) * w,
-            "Θ":    1.0 - w,
-        }
-    # verdict == "REAL"
-    return {
-        "REAL": c * w,
-        "FAKE": (1.0 - c) * w,
-        "Θ":    1.0 - w,
-    }
-def _ds_combine(m1: _BPA, m2: _BPA) -> _BPA:
-    """
-    Dempster's combination rule for two BPAs over {FAKE, REAL, Θ}.
-    K = conflict = Σ_{A∩B=∅} m1(A)·m2(B)
-    m12(C) = Σ_{A∩B=C} m1(A)·m2(B) / (1 - K)   for C ≠ ∅
-    """
-    # Conflict mass: FAKE ∩ REAL = ∅, so conflict = FAKE×REAL + REAL×FAKE
-    K = m1["FAKE"] * m2["REAL"] + m1["REAL"] * m2["FAKE"]
-    # Unnormalised joint masses
-    raw_fake = (
-        m1["FAKE"] * m2["FAKE"]    # FAKE ∩ FAKE = FAKE
-        + m1["FAKE"] * m2["Θ"]    # FAKE ∩ Θ    = FAKE
-        + m1["Θ"]   * m2["FAKE"]  # Θ    ∩ FAKE = FAKE
-    )
-    raw_real = (
-        m1["REAL"] * m2["REAL"]
-        + m1["REAL"] * m2["Θ"]
-        + m1["Θ"]   * m2["REAL"]
-    )
-    raw_theta = m1["Θ"] * m2["Θ"]  # Θ ∩ Θ = Θ
-    norm = 1.0 - K
-    if norm < 1e-9:
-        # Total conflict → maximum uncertainty
-        return {"FAKE": 0.5, "REAL": 0.5, "Θ": 0.0}
-    return {
-        "FAKE": raw_fake  / norm,
-        "REAL": raw_real  / norm,
-        "Θ":    raw_theta / norm,
-    }
 def fuse(results: list[EngineResult], is_video: bool = False) -> tuple[str, float, str]:
     """
-    Dempster-Shafer fusion of engine results.
     Returns (verdict, confidence_for_verdict, attributed_generator).
-    Confidence is derived via the pignistic probability transform (Smets 1990):
-    ignorance mass Θ is split equally between FAKE and REAL before thresholding.
-    This avoids overconfident verdicts when engines disagree.
     """
     active = [r for r in results if r.verdict != "UNKNOWN"]
     if not active:
         return "UNKNOWN", 0.5, "unknown_generative"
-    # Build and combine BPAs iteratively
-    bpas = [_engine_to_bpa(r, is_video) for r in active]
-    combined = bpas[0]
-    for bpa in bpas[1:]:
-        combined = _ds_combine(combined, bpa)
-    # Pignistic transform: distribute Θ mass equally
-    theta = combined.get("Θ", 0.0)
-    pign_fake = combined["FAKE"] + theta / 2.0
-    pign_real = combined["REAL"] + theta / 2.0
-    pign_total = pign_fake + pign_real + 1e-9
-    fake_prob = float(np.clip(pign_fake / pign_total, 0.0, 1.0))
     verdict = "FAKE" if fake_prob > 0.5 else "REAL"
     confidence = fake_prob if verdict == "FAKE" else (1.0 - fake_prob)
@@ -178,17 +162,28 @@ class Fuser:
                 engine_breakdown=[],
             )
-        verdict, confidence, generator = fuse(results, is_video=(media_type == "video"))
         if verdict == "UNKNOWN":
             explanation = "No active engine outputs were available."
         else:
-            summary = ", ".join(
-                f"{result.engine}:{result.verdict}({result.confidence:.2f})"
-                for result in results
             )
             explanation = (
-                f"Dempster-Shafer fusion ({media_type}) from engines: {summary}."
             )
         return DetectionResponse(

 """
 src/fusion/fuser.py — Multi-engine evidence fusion.
+Implements attention-weighted MLP fusion of the three detection engine
+outputs (paper §III-E / Module 5).
+Architecture (Eq. 5 in paper):
+    alpha = softmax(W2 @ ReLU(W1 @ s + b1) + b2)
+    FakeScore = dot(alpha, s)
+where s = [s_fingerprint, s_coherence, s_sstgnn] are per-engine fake
+probability scores in [0, 1].
+Default MLP weights encode engine reliability priors without requiring a
+trained calibration set. Replace with calibration-trained weights by setting
+MODEL_WEIGHTS_PATH to a .npz file containing W1, b1, W2, b2 arrays.
 """
 from __future__ import annotations
+import logging
+import os
+from pathlib import Path
 import numpy as np
 from src.types import DetectionResponse, EngineResult
+logger = logging.getLogger(__name__)
 # Attribution priority: which engine's generator label is most trusted
 ATTRIBUTION_PRIORITY: dict[str, int] = {
     "coherence":   3,
 }
+# Engine order — must match the dimension layout of all weight arrays
+_ENGINE_ORDER = ("fingerprint", "coherence", "sstgnn")
+# Default MLP weights (3-in → 3-hidden → 3-out, identity-pass-through)
+# b2 encodes log-prior attention: fingerprint=0.45, coherence=0.35, sstgnn=0.20 (image)
+#                             or: coherence=0.45, fingerprint=0.35, sstgnn=0.20 (video)
+_W1_DEFAULT = np.eye(3, dtype=np.float64)
+_b1_DEFAULT = np.zeros(3, dtype=np.float64)
+_W2_DEFAULT = np.eye(3, dtype=np.float64)
+_b2_image_DEFAULT = np.array([np.log(0.45), np.log(0.35), np.log(0.20)], dtype=np.float64)
+_b2_video_DEFAULT = np.array([np.log(0.35), np.log(0.45), np.log(0.20)], dtype=np.float64)
+# Runtime weight tensors (replaced if MODEL_WEIGHTS_PATH is set)
+_W1 = _W1_DEFAULT.copy()
+_b1 = _b1_DEFAULT.copy()
+_W2 = _W2_DEFAULT.copy()
+_b2_image = _b2_image_DEFAULT.copy()
+_b2_video = _b2_video_DEFAULT.copy()
+def _load_calibration_weights(path: str) -> bool:
+    """Load calibration-trained MLP weights from a .npz file."""
+    global _W1, _b1, _W2, _b2_image, _b2_video
+    try:
+        data = np.load(path)
+        _W1 = data["W1"].astype(np.float64)
+        _b1 = data["b1"].astype(np.float64)
+        _W2 = data["W2"].astype(np.float64)
+        _b2_image = data["b2_image"].astype(np.float64)
+        _b2_video = data["b2_video"].astype(np.float64)
+        logger.info("Loaded fusion MLP weights from %s", path)
+        return True
+    except Exception as exc:
+        logger.warning("Could not load fusion weights from %s: %s — using defaults", path, exc)
+        return False
+_weights_path = os.environ.get("MODEL_WEIGHTS_PATH", "")
+if _weights_path and Path(_weights_path).exists():
+    _load_calibration_weights(_weights_path)
+def _softmax(x: np.ndarray) -> np.ndarray:
+    x = x - x.max()
+    e = np.exp(x)
+    return e / (e.sum() + 1e-9)
+def _attention_weights(s: np.ndarray, is_video: bool) -> np.ndarray:
+    """
+    Two-layer MLP: alpha = softmax(W2 @ ReLU(W1 @ s + b1) + b2)
+    Returns a 3-vector of attention weights summing to 1.
+    """
+    h = np.maximum(_W1 @ s + _b1, 0.0)
+    b2 = _b2_video if is_video else _b2_image
+    logits = _W2 @ h + b2
+    return _softmax(logits)
 def _normalize_generator(value: str | None) -> str:
     return str(value).strip().lower().replace(" ", "_")
 def fuse(results: list[EngineResult], is_video: bool = False) -> tuple[str, float, str]:
     """
+    Attention-weighted MLP fusion of engine results (paper §III-E).
     Returns (verdict, confidence_for_verdict, attributed_generator).
     """
     active = [r for r in results if r.verdict != "UNKNOWN"]
     if not active:
         return "UNKNOWN", 0.5, "unknown_generative"
+    # Build per-engine fake probability scores (direction-normalised to [0,1])
+    fake_score_map: dict[str, float] = {}
+    for r in active:
+        if r.verdict == "FAKE":
+            fake_score_map[r.engine] = float(r.confidence)
+        else:
+            fake_score_map[r.engine] = 1.0 - float(r.confidence)
+    s = np.array(
+        [fake_score_map.get(eng, 0.5) for eng in _ENGINE_ORDER],
+        dtype=np.float64,
+    )
+    alpha = _attention_weights(s, is_video)
+    fake_prob = float(np.clip(float(np.dot(alpha, s)), 0.0, 1.0))
     verdict = "FAKE" if fake_prob > 0.5 else "REAL"
     confidence = fake_prob if verdict == "FAKE" else (1.0 - fake_prob)
                 engine_breakdown=[],
             )
+        is_video = media_type == "video"
+        verdict, confidence, generator = fuse(results, is_video=is_video)
         if verdict == "UNKNOWN":
             explanation = "No active engine outputs were available."
         else:
+            active = [r for r in results if r.verdict != "UNKNOWN"]
+            fake_score_map = {
+                r.engine: float(r.confidence) if r.verdict == "FAKE" else 1.0 - float(r.confidence)
+                for r in active
+            }
+            s = np.array([fake_score_map.get(e, 0.5) for e in _ENGINE_ORDER])
+            alpha = _attention_weights(s, is_video)
+            alpha_str = ", ".join(
+                f"{eng}:{w:.2f}" for eng, w in zip(_ENGINE_ORDER, alpha)
+            )
+            engines_str = ", ".join(
+                f"{r.engine}:{r.verdict}({r.confidence:.2f})" for r in results
             )
             explanation = (
+                f"Attention-MLP fusion ({media_type}): alpha=[{alpha_str}]. "
+                f"Engines: {engines_str}."
             )
         return DetectionResponse(

src/training/config.py CHANGED Viewed

@@ -14,17 +14,20 @@ from typing import List
 # Generator label index mapping — must match GeneratorLabel enum in src/types.py
 # and the classification head in every model file.
 GENERATOR_CLASSES: List[str] = [
-    "real",               # 0
-    "unknown_gan",        # 1
-    "stable_diffusion",   # 2
-    "midjourney",         # 3
-    "dall_e",             # 4
-    "flux",               # 5
-    "firefly",            # 6
-    "imagen",             # 7
 ]
-NUM_GENERATOR_CLASSES: int = len(GENERATOR_CLASSES)  # 8 — never change this
 @dataclass

 # Generator label index mapping — must match GeneratorLabel enum in src/types.py
 # and the classification head in every model file.
+# Index 0 = real (binary negative class); indices 1-8 = the 8 AI generator classes
+# from paper Table II (Sora, Runway Gen-2, Wav2Lip, SD v1.5, SDXL, MJv6, DALL-E 3, OOD).
 GENERATOR_CLASSES: List[str] = [
+    "real",                 # 0
+    "sora",                 # 1
+    "runway",               # 2
+    "wav2lip",              # 3
+    "stable_diffusion",     # 4
+    "sdxl",                 # 5
+    "midjourney",           # 6
+    "dall_e",               # 7
+    "unknown_generative",   # 8
 ]
+NUM_GENERATOR_CLASSES: int = len(GENERATOR_CLASSES) - 1  # 8 AI generators (excludes "real")
 @dataclass

test_assets/README.md ADDED Viewed

	@@ -0,0 +1,5 @@

+Add short validation clips here for manual smoke tests.
+Suggested files from CLAUDE.md:
+- `real_sample.mp4`
+- `fake_sample.mp4`

tests/training/test_datasets.py CHANGED Viewed

@@ -30,10 +30,10 @@ def test_training_config_num_generator_classes():
     import sys
     sys.path.insert(0, str(Path(__file__).parent.parent.parent))
     from src.training.config import NUM_GENERATOR_CLASSES, GENERATOR_CLASSES
-    assert NUM_GENERATOR_CLASSES == 8
-    assert len(GENERATOR_CLASSES) == 8
     assert GENERATOR_CLASSES[0] == "real"
-    assert GENERATOR_CLASSES[7] == "imagen"
 def test_training_config_dataclass_defaults():

     import sys
     sys.path.insert(0, str(Path(__file__).parent.parent.parent))
     from src.training.config import NUM_GENERATOR_CLASSES, GENERATOR_CLASSES
+    assert NUM_GENERATOR_CLASSES == 8                           # 8 AI generators
+    assert len(GENERATOR_CLASSES) == NUM_GENERATOR_CLASSES + 1  # +1 for "real"
     assert GENERATOR_CLASSES[0] == "real"
+    assert GENERATOR_CLASSES[8] == "unknown_generative"
 def test_training_config_dataclass_defaults():

tests/training/test_metrics.py CHANGED Viewed

@@ -56,10 +56,10 @@ def test_training_config_consistency():
     from src.training.config import NUM_GENERATOR_CLASSES, GENERATOR_CLASSES
     from src.types import GeneratorLabel, GENERATOR_INDEX_TO_LABEL
-    assert NUM_GENERATOR_CLASSES == 8
-    assert len(GENERATOR_CLASSES) == 8
-    assert len(GeneratorLabel) == 8
-    assert len(GENERATOR_INDEX_TO_LABEL) == 8
     # All class names must map to a valid GeneratorLabel
     for name in GENERATOR_CLASSES:

     from src.training.config import NUM_GENERATOR_CLASSES, GENERATOR_CLASSES
     from src.types import GeneratorLabel, GENERATOR_INDEX_TO_LABEL
+    assert NUM_GENERATOR_CLASSES == 8                           # 8 AI generator classes
+    assert len(GENERATOR_CLASSES) == NUM_GENERATOR_CLASSES + 1  # +1 for "real"
+    assert len(GeneratorLabel) == NUM_GENERATOR_CLASSES + 1     # +1 for "real"
+    assert len(GENERATOR_INDEX_TO_LABEL) == NUM_GENERATOR_CLASSES + 1  # +1 for "real"
     # All class names must map to a valid GeneratorLabel
     for name in GENERATOR_CLASSES:

utils/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+from utils.graph import video_to_graph
+from utils.video import extract_audio_waveform, extract_frames
+__all__ = ["extract_audio_waveform", "extract_frames", "video_to_graph"]

utils/graph.py ADDED Viewed

	@@ -0,0 +1,45 @@

+from __future__ import annotations
+import numpy as np
+from src.engines.sstgnn.graph_builder import build_temporal_graph
+from src.services.media_utils import extract_video_frames
+KEYPOINT_STEP = 7
+KEYPOINT_COUNT = 68
+def video_to_graph(video_path: str, max_frames: int = 32):
+    import mediapipe as mp  # type: ignore
+    frames = extract_video_frames(video_path, max_frames=max_frames)
+    if not frames:
+        raise ValueError("Could not extract frames from video")
+    face_mesh = mp.solutions.face_mesh.FaceMesh(
+        static_image_mode=True,
+        max_num_faces=1,
+        refine_landmarks=True,
+    )
+    sequences: list[np.ndarray] = []
+    for frame in frames:
+        result = face_mesh.process(frame)
+        if not result.multi_face_landmarks:
+            continue
+        landmarks = result.multi_face_landmarks[0].landmark
+        selected = []
+        for index in list(range(0, 468, KEYPOINT_STEP))[:KEYPOINT_COUNT]:
+            landmark = landmarks[index]
+            selected.append([float(landmark.x), float(landmark.y), float(landmark.z)])
+        sequences.append(np.array(selected, dtype=np.float32))
+    face_mesh.close()
+    if not sequences:
+        raise ValueError("No face landmarks detected in video")
+    sequence = np.stack(sequences, axis=0)
+    return build_temporal_graph(sequence)

utils/video.py ADDED Viewed

	@@ -0,0 +1,13 @@

+from __future__ import annotations
+from pathlib import Path
+from src.services.media_utils import extract_audio_waveform, extract_video_frames
+def extract_frames(video_path: str | Path, max_frames: int = 32):
+    return extract_video_frames(video_path, max_frames=max_frames)
+__all__ = ["extract_audio_waveform", "extract_frames"]

weights/README.md ADDED Viewed

	@@ -0,0 +1,5 @@

+Place optional fusion model weights here.
+Expected file from CLAUDE.md:
+- `fusion_mlp.pt`