Spaces:

danielzest
/

GenD-Sentinel

Configuration error

App Files Files Community

yermandy commited on Nov 13, 2025

Commit

c29babb

0 Parent(s):

init

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +3 -0
.gitignore +14 -0
.project-root +1 -0
LICENSE +21 -0
README.md +157 -0
config/datasets/CDFv2/test/Celeb-real.txt +4 -0
config/datasets/CDFv2/test/Celeb-synthesis.txt +4 -0
config/datasets/CDFv2/test/YouTube-real.txt +4 -0
config/datasets/FF/test/DF.txt +4 -0
config/datasets/FF/test/F2F.txt +4 -0
config/datasets/FF/test/FS.txt +4 -0
config/datasets/FF/test/NT.txt +4 -0
config/datasets/FF/test/real.txt +4 -0
datasets/CDFv2/Celeb-real/id0_0000/000.png +3 -0
datasets/CDFv2/Celeb-real/id0_0000/015.png +3 -0
datasets/CDFv2/Celeb-real/id0_0000/030.png +3 -0
datasets/CDFv2/Celeb-real/id0_0000/045.png +3 -0
datasets/CDFv2/Celeb-synthesis/id0_id1_0000/000.png +3 -0
datasets/CDFv2/Celeb-synthesis/id0_id1_0000/015.png +3 -0
datasets/CDFv2/Celeb-synthesis/id0_id1_0000/030.png +3 -0
datasets/CDFv2/Celeb-synthesis/id0_id1_0000/045.png +3 -0
datasets/CDFv2/YouTube-real/00000/000.png +3 -0
datasets/CDFv2/YouTube-real/00000/014.png +3 -0
datasets/CDFv2/YouTube-real/00000/028.png +3 -0
datasets/CDFv2/YouTube-real/00000/043.png +3 -0
datasets/FF/DF/000_003/000.png +3 -0
datasets/FF/DF/000_003/012.png +3 -0
datasets/FF/DF/000_003/025.png +3 -0
datasets/FF/DF/000_003/038.png +3 -0
datasets/FF/F2F/000_003/000.png +3 -0
datasets/FF/F2F/000_003/009.png +3 -0
datasets/FF/F2F/000_003/019.png +3 -0
datasets/FF/F2F/000_003/029.png +3 -0
datasets/FF/FS/000_003/000.png +3 -0
datasets/FF/FS/000_003/009.png +3 -0
datasets/FF/FS/000_003/019.png +3 -0
datasets/FF/FS/000_003/029.png +3 -0
datasets/FF/NT/000_003/000.png +3 -0
datasets/FF/NT/000_003/009.png +3 -0
datasets/FF/NT/000_003/019.png +3 -0
datasets/FF/NT/000_003/029.png +3 -0
datasets/FF/real/000/000.png +3 -0
datasets/FF/real/000/012.png +3 -0
datasets/FF/real/000/025.png +3 -0
datasets/FF/real/000/038.png +3 -0
detector.py +701 -0
pyproject.toml +37 -0
requirements.txt +34 -0
run.py +174 -0
run_exp.py +209 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,3 @@

+*.png filter=lfs diff=lfs merge=lfs -text
+*.mp4 filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,14 @@

+__pycache__
+/.vscode
+/config
+/datasets
+/outputs
+/runs
+/weights
+/logs
+/tmp
+x.py
+y.py
+z.py

.project-root ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Do not remove, this file is used by the project to determine the root of the project

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 Andy
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,157 @@

+# Deepfake Detection that Generalizes Across Benchmarks (WACV 2026)
+[![arXiv Badge](https://img.shields.io/badge/arXiv-B31B1B?logo=arxiv&logoColor=FFF)](https://arxiv.org/abs/2508.06248)
+[![Hugging Face Badge](https://img.shields.io/badge/Hugging%20Face-FFD21E?logo=huggingface&logoColor=000)](https://huggingface.co/collections/yermandy/gend)
+This is the official repository for the paper:
+**[Deepfake Detection that Generalizes Across Benchmarks](https://arxiv.org/abs/2508.06248)**.
+### Abstract
+> The generalization of deepfake detectors to unseen manipulation techniques remains a challenge for practical deployment. Although many approaches adapt foundation models by introducing significant architectural complexity, this work demonstrates that robust generalization is achievable through a parameter-efficient adaptation of one of the foundational pre-trained vision encoders. The proposed method, GenD, fine-tunes only the Layer Normalization parameters (0.03% of the total) and enhances generalization by enforcing a hyperspherical feature manifold using L2 normalization and metric learning on it.
+>
+> We conducted an extensive evaluation on 14 benchmark datasets spanning from 2019 to 2025. The proposed method achieves state-of-the-art performance, outperforming more complex, recent approaches in average cross-dataset AUROC. Our analysis yields two primary findings for the field: 1) training on paired real-fake data from the same source video is essential for mitigating shortcut learning and improving generalization, and 2) detection difficulty on academic datasets has not strictly increased over time, with models trained on older, diverse datasets showing strong generalization capabilities.
+>
+> This work delivers a computationally efficient and reproducible method, proving that state-of-the-art generalization is attainable by making targeted, minimal changes to a pre-trained foundational image encoder model.
+## Inference using Hugging Face transformers
+This example shows how to run inference with the pretrained GenD model from Hugging Face without other dependencies except `torch` and `transformers`. It expects that input images are already preprocessed by detector.
+### Minimal dependencies
+``` bash
+conda create --name GenD python=3.12 uv -y
+conda activate GenD
+uv pip install torch==2.8.0
+uv pip install torchvision==0.23.0
+uv pip install transformers==4.56.2
+```
+### Inference with transformers
+``` python
+import requests
+import torch
+from PIL import Image
+from src.hf.modeling_gend import GenD
+# Other models can be found in https://huggingface.co/collections/yermandy/gend:
+# -**** yermandy/GenD_CLIP_L_14
+# - yermandy/GenD_PE_L
+# - yermandy/GenD_DINOv3_L
+model = GenD.from_pretrained("yermandy/GenD_CLIP_L_14")
+urls = [
+    "https://github.com/yermandy/deepfake-detection/blob/main/datasets/FF/DF/000_003/000.png?raw=true",
+    "https://github.com/yermandy/deepfake-detection/blob/main/datasets/FF/real/000/000.png?raw=true",
+]
+images = [Image.open(requests.get(url, stream=True).raw) for url in urls]
+tensors = torch.stack([model.feature_extractor.preprocess(img) for img in images])
+logits = model(tensors)
+probs = logits.softmax(dim=-1)
+print(probs)
+```
+## Training
+### Set up environment
+``` bash
+conda create --name GenD python=3.12 uv -y
+conda activate GenD
+uv pip install -r requirements.txt
+```
+### Minimal example without external data
+#### Training example
+Examine `src/exp/examples.py`, each experiment name is defined as a key, a value overrides default configuration of `Config` object from `src/config.py`. For example, try to run `example-training` experiment:
+``` bash
+python run_exp.py example-training
+```
+#### Test example after the model is trained
+``` bash
+python run_exp.py example-test --from_exp example-training --test
+```
+Alternatively, you can try inference using one of our released models from Hugging Face:
+``` bash
+python run_exp.py GenD_CLIP--CDFv2-example --test
+python run_exp.py GenD_PE--CDFv2-example --test
+python run_exp.py GenD_DINO--CDFv2-example --test
+```
+### Full training
+To fully train the model, you need to download datasets, preprocess them, and create files with paths to the images.
+The training entry will be similar to the minimal example above.
+All experiments (configs) from the paper are stored in the `src/exp` folder.
+#### Prepare the dataset
+Take for example [FaceForensics++](https://github.com/ondyari/FaceForensics) dataset, follow these steps:
+1. Download the dataset first from the [official source](https://github.com/ondyari/FaceForensics). The root of this dataset is `./FaceForensics`
+2. Preprocess the dataset using `detector.py` script:
+``` bash
+python detector.py -i FaceForensics/manipulated_sequences/Deepfakes/c23/videos/ --mask_folder FaceForensics/masks/manipulated_sequences/Deepfakes/masks/videos/ -m at_least -n 32 -o datasets/FF/DF/ --det_thres 0.1 -s 1.3 --target_size none
+```
+Repeat the process for other manipulation methods and real videos. After processing everything, you will get a similar structure:
+``` bash
+datasets
+└── FF
+    ├── DF
+    │   └── 000_003
+    │       ├── 025.png
+    │       └── 038.png
+    ├── F2F
+    │   └── 000_003
+    │       ├── 019.png
+    │       └── 029.png
+    ├── FS
+    │   └── 000_003
+    │       ├── 019.png
+    │       └── 029.png
+    ├── NT
+    │   └── 000_003
+    │       ├── 019.png
+    │       └── 029.png
+    └── real
+        └── 000
+            ├── 025.png
+            └── 038.png
+```
+3. Create files with paths to images similar to the ones in `config/datasets` directory. It can be done using:
+``` bash
+find datasets/FF/DF/* -type f | sort > config/datasets/FF/DF.txt
+```
+We manage links to files using `src/utils/files.py`.
+### Cite
+``` bibtex
+@article{yermakov2025deepfake,
+  title={Deepfake Detection that Generalizes Across Benchmarks},
+  author={Yermakov, Andrii and Cech, Jan and Matas, Jiri and Fritz, Mario},
+  journal={arXiv preprint arXiv:2508.06248},
+  year={2025}
+}
+```

config/datasets/CDFv2/test/Celeb-real.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+datasets/CDFv2/Celeb-real/id0_0000/045.png
+datasets/CDFv2/Celeb-real/id0_0000/030.png
+datasets/CDFv2/Celeb-real/id0_0000/015.png
+datasets/CDFv2/Celeb-real/id0_0000/000.png

config/datasets/CDFv2/test/Celeb-synthesis.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+datasets/CDFv2/Celeb-synthesis/id0_id1_0000/000.png
+datasets/CDFv2/Celeb-synthesis/id0_id1_0000/045.png
+datasets/CDFv2/Celeb-synthesis/id0_id1_0000/030.png
+datasets/CDFv2/Celeb-synthesis/id0_id1_0000/015.png

config/datasets/CDFv2/test/YouTube-real.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+datasets/CDFv2/YouTube-real/00000/000.png
+datasets/CDFv2/YouTube-real/00000/014.png
+datasets/CDFv2/YouTube-real/00000/028.png
+datasets/CDFv2/YouTube-real/00000/043.png

config/datasets/FF/test/DF.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+datasets/FF/DF/000_003/000.png
+datasets/FF/DF/000_003/012.png
+datasets/FF/DF/000_003/025.png
+datasets/FF/DF/000_003/038.png

config/datasets/FF/test/F2F.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+datasets/FF/F2F/000_003/000.png
+datasets/FF/F2F/000_003/009.png
+datasets/FF/F2F/000_003/019.png
+datasets/FF/F2F/000_003/029.png

config/datasets/FF/test/FS.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+datasets/FF/FS/000_003/000.png
+datasets/FF/FS/000_003/009.png
+datasets/FF/FS/000_003/019.png
+datasets/FF/FS/000_003/029.png

config/datasets/FF/test/NT.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+datasets/FF/NT/000_003/000.png
+datasets/FF/NT/000_003/009.png
+datasets/FF/NT/000_003/019.png
+datasets/FF/NT/000_003/029.png

config/datasets/FF/test/real.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+datasets/FF/real/000/000.png
+datasets/FF/real/000/012.png
+datasets/FF/real/000/025.png
+datasets/FF/real/000/038.png

datasets/CDFv2/Celeb-real/id0_0000/000.png ADDED Viewed

Git LFS Details

SHA256: 33a652cb6ad545d41465a978ab4bb02137380db1747a1a460d193a3e0ecd4db6
Pointer size: 130 Bytes
Size of remote file: 51.8 kB

datasets/CDFv2/Celeb-real/id0_0000/015.png ADDED Viewed

Git LFS Details

SHA256: cca6e95d080ccafbd35709c0e6ce60a12b415c7f97cf40ac4c1edb7fba441e4f
Pointer size: 130 Bytes
Size of remote file: 55.2 kB

datasets/CDFv2/Celeb-real/id0_0000/030.png ADDED Viewed

Git LFS Details

SHA256: 20d6775e831ef3bab80cd928c930b0916f09b933814fe0d2b1d6e51b0106c77a
Pointer size: 130 Bytes
Size of remote file: 56.3 kB

datasets/CDFv2/Celeb-real/id0_0000/045.png ADDED Viewed

Git LFS Details

SHA256: 6ca1683cfe93a01ac7800de0efa5f82c38abb5b17582c5708671de567958e08e
Pointer size: 130 Bytes
Size of remote file: 57.8 kB

datasets/CDFv2/Celeb-synthesis/id0_id1_0000/000.png ADDED Viewed

Git LFS Details

SHA256: cae26283688b2e1855b75b922f75e0945dd29f1669e5c399b9e0f5bc75a4700c
Pointer size: 130 Bytes
Size of remote file: 51.5 kB

datasets/CDFv2/Celeb-synthesis/id0_id1_0000/015.png ADDED Viewed

Git LFS Details

SHA256: 708f825b0fa8403e5db2966d96e4be4b18f4d1f55ab7352acdc2bd2d7542ee5b
Pointer size: 130 Bytes
Size of remote file: 54.6 kB

datasets/CDFv2/Celeb-synthesis/id0_id1_0000/030.png ADDED Viewed

Git LFS Details

SHA256: 5693e8c8469630a8699f6cc4b8a51edd4297d31a956a14deba1a1b7796230ec4
Pointer size: 130 Bytes
Size of remote file: 54 kB

datasets/CDFv2/Celeb-synthesis/id0_id1_0000/045.png ADDED Viewed

Git LFS Details

SHA256: 90c2250095d7331e64caa42d04d25cac5881288bbfa7013e68302cbe758ade06
Pointer size: 130 Bytes
Size of remote file: 56.1 kB

datasets/CDFv2/YouTube-real/00000/000.png ADDED Viewed

Git LFS Details

SHA256: c959f785348832e39685a4918904b3a145ff475ce69eef55624708b423473218
Pointer size: 130 Bytes
Size of remote file: 52.1 kB

datasets/CDFv2/YouTube-real/00000/014.png ADDED Viewed

Git LFS Details

SHA256: 775bf1f48de7319a4557071844ad3fb587bc048f0ca1d8a52b696f4167476996
Pointer size: 130 Bytes
Size of remote file: 58.8 kB

datasets/CDFv2/YouTube-real/00000/028.png ADDED Viewed

Git LFS Details

SHA256: c7eb0e9af30cbe4e2cab7c6df54a434405470eca81b416e6f835e65fac8e1fc6
Pointer size: 130 Bytes
Size of remote file: 59 kB

datasets/CDFv2/YouTube-real/00000/043.png ADDED Viewed

Git LFS Details

SHA256: 37adad8dd2df6a8d82f5ea530ab9db464a81037125180e7130ee3fe1bc1ac567
Pointer size: 130 Bytes
Size of remote file: 59.2 kB

datasets/FF/DF/000_003/000.png ADDED Viewed

Git LFS Details

SHA256: b2c605732c6b2152320a986173c1a3dc7544938e948aa46444340831b8018060
Pointer size: 130 Bytes
Size of remote file: 82.6 kB

datasets/FF/DF/000_003/012.png ADDED Viewed

Git LFS Details

SHA256: 91b2bf24d9c3685c28559a5ac8c91b94bbe5b841a563b961f4a7679f40e8f4b8
Pointer size: 130 Bytes
Size of remote file: 83.8 kB

datasets/FF/DF/000_003/025.png ADDED Viewed

Git LFS Details

SHA256: ca387094072caa412a0f683d909084580896f2c40dd669baf41c04166efffa95
Pointer size: 130 Bytes
Size of remote file: 82 kB

datasets/FF/DF/000_003/038.png ADDED Viewed

Git LFS Details

SHA256: 781d4ab6041c8b37c75b7d831f02944cae11b41fb848d5e34c577a142cb3a1a0
Pointer size: 130 Bytes
Size of remote file: 82.2 kB

datasets/FF/F2F/000_003/000.png ADDED Viewed

Git LFS Details

SHA256: a8e8c25ddc42909f3b82aedf939ca5fcf3924d1aeb54ad9dbb507ed97d050692
Pointer size: 130 Bytes
Size of remote file: 82.7 kB

datasets/FF/F2F/000_003/009.png ADDED Viewed

Git LFS Details

SHA256: 8ddccb64c403bb4cd0245f6826699bd53b02c3bc9b5435d3134b4d9aba019d85
Pointer size: 130 Bytes
Size of remote file: 82.1 kB

datasets/FF/F2F/000_003/019.png ADDED Viewed

Git LFS Details

SHA256: e1a08d87e07a821d0972e893e5b7fd7777e901d7b0433d1a1acb21862f349f5f
Pointer size: 130 Bytes
Size of remote file: 82 kB

datasets/FF/F2F/000_003/029.png ADDED Viewed

Git LFS Details

SHA256: da6f3aa04c3e0d6155dacdb4003cfb507f5b600b087e57d8184387c1fbbc76eb
Pointer size: 130 Bytes
Size of remote file: 82 kB

datasets/FF/FS/000_003/000.png ADDED Viewed

Git LFS Details

SHA256: 57b282095be875d9b161360d43cbd4b04d0536c093bb967e6bc6661baf7db361
Pointer size: 130 Bytes
Size of remote file: 82.3 kB

datasets/FF/FS/000_003/009.png ADDED Viewed

Git LFS Details

SHA256: 8481e3a732f811a832598bd39ffb426cd966f4b16478e603c5ec9f5351b732ff
Pointer size: 130 Bytes
Size of remote file: 81.1 kB

datasets/FF/FS/000_003/019.png ADDED Viewed

Git LFS Details

SHA256: bc24297d7825d45f663960f9bc0b7e215e7b67312ea9b00f495d289df432b32b
Pointer size: 130 Bytes
Size of remote file: 81.3 kB

datasets/FF/FS/000_003/029.png ADDED Viewed

Git LFS Details

SHA256: caae8b99b9f0016e7e3c2bbe9500422687bbdcd4ae2305f4de8ff7a05c5bf587
Pointer size: 130 Bytes
Size of remote file: 80.5 kB

datasets/FF/NT/000_003/000.png ADDED Viewed

Git LFS Details

SHA256: ca2b6ce7ea30df50d86855adc4c2dd11b1ba0848153746303e73cb58d7035290
Pointer size: 130 Bytes
Size of remote file: 81.5 kB

datasets/FF/NT/000_003/009.png ADDED Viewed

Git LFS Details

SHA256: 616d27b48e463eeadd9f10a3b6294b20d28b71393cbca66c7e5d91c626644f01
Pointer size: 130 Bytes
Size of remote file: 80.5 kB

datasets/FF/NT/000_003/019.png ADDED Viewed

Git LFS Details

SHA256: ec4ac46bf3f06d5de47af5bb7e7af691fc3dd46199f8d5216cdbabe65f311faf
Pointer size: 130 Bytes
Size of remote file: 80.2 kB

datasets/FF/NT/000_003/029.png ADDED Viewed

Git LFS Details

SHA256: 37a3fd9de8ea981445ca036dfc4ba44d8d1ff42163e14972563e6c5485b57af2
Pointer size: 130 Bytes
Size of remote file: 80.1 kB

datasets/FF/real/000/000.png ADDED Viewed

Git LFS Details

SHA256: 33813fa2f7a716f20f27f11bf7e4126c53136108bec3db92dd6001e9453b185a
Pointer size: 130 Bytes
Size of remote file: 84 kB

datasets/FF/real/000/012.png ADDED Viewed

Git LFS Details

SHA256: d3af45388e764ae25e9a176a9ed148004d4e155146c9364199852a96815aafaf
Pointer size: 130 Bytes
Size of remote file: 84.8 kB

datasets/FF/real/000/025.png ADDED Viewed

Git LFS Details

SHA256: 2b027fe48cf3468ad596acf3f8f9db59374dce243f57d4abdb89ab557e6681b6
Pointer size: 130 Bytes
Size of remote file: 82.4 kB

datasets/FF/real/000/038.png ADDED Viewed

Git LFS Details

SHA256: fc521c3b01c76428d353d734fde85969613e450666135881029c1e4e5cd38ce1
Pointer size: 130 Bytes
Size of remote file: 83.7 kB

detector.py ADDED Viewed

	@@ -0,0 +1,701 @@

+import argparse
+import heapq
+import os
+import subprocess
+from concurrent.futures import ThreadPoolExecutor
+from glob import glob
+import cv2
+import numpy as np
+from tqdm import tqdm
+from src.retinaface import RetinaFace, prepare_model
+def max_spread_permutation_pq(N, start=0):
+    """
+    Generate a permutation of 0..N-1 such that at each step
+    the next element is the one whose minimum distance to
+    all previously chosen elements is maximized, using a
+    priority queue to speed up selection.
+    Args:
+        N (int): Length of the permutation.
+        start (int): The first element in the permutation (default 0).
+    Returns:
+        List[int]: A list representing the permutation.
+    """
+    if not (0 <= start < N):
+        raise ValueError("`start` must be in the range [0, N-1]")
+    # Initialize chosen list and distance map
+    chosen = [start]
+    dist = {i: abs(i - start) for i in range(N) if i != start}
+    # Build a max-heap (use negative distances for heapq)
+    heap = [(-d, i) for i, d in dist.items()]
+    heapq.heapify(heap)
+    # Greedily pick elements
+    while heap:
+        # Pop until we find a valid (up-to-date) entry
+        while True:
+            neg_d, candidate = heapq.heappop(heap)
+            current = -neg_d
+            # Only accept if it matches the latest dist
+            if dist.get(candidate, -1) == current:
+                break
+        # Add the selected candidate
+        chosen.append(candidate)
+        # Remove it from dist-map
+        del dist[candidate]
+        # Update distances for remaining elements
+        for other in list(dist.keys()):
+            new_d = abs(other - candidate)
+            if new_d < dist[other]:
+                dist[other] = new_d
+                heapq.heappush(heap, (-new_d, other))
+    return chosen
+def get_video_frames_generator(
+    source_path: str,
+    mask_path: str,
+    stride: int = 1,
+    num_frames=32,
+    mode="at_least",
+):
+    video = cv2.VideoCapture(source_path)
+    if not video.isOpened():
+        print(f"Warning: Video {source_path} cannot be opened!")
+        return
+    video_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
+    if mask_path is not None:
+        mask_video = cv2.VideoCapture(mask_path)
+        if not mask_video.isOpened():
+            print(f"Warning: Mask video {mask_path} cannot be opened!")
+            return
+        mask_frames = int(mask_video.get(cv2.CAP_PROP_FRAME_COUNT))
+        if video_frames != mask_frames:
+            print(
+                f"Warning: {source_path} and {mask_path} have different number of frames {video_frames} vs {mask_frames}!"
+            )
+        total_frames = min(video_frames, mask_frames)
+    else:
+        mask_video = None
+        total_frames = video_frames
+    if not video.isOpened():
+        raise Exception(f"Could not open video at {source_path}")
+    # Get the mode
+    if mode == "fixed_num_frames":
+        # Get the frame rate of the video by dividing the number of frames by the duration (same interval between frames)
+        frame_ids = np.linspace(0, total_frames - 1, num_frames, endpoint=True, dtype=int)
+    elif mode == "fixed_stride":
+        # Get the frame rate of the video by dividing the number of frames by the duration (same interval between frames)
+        frame_ids = np.arange(0, total_frames, stride, dtype=int)
+    elif mode == "at_least":
+        frame_ids = max_spread_permutation_pq(total_frames, start=total_frames // 2)
+    else:
+        raise ValueError(f"Invalid mode: {mode}. Choose 'fixed_num_frames', 'fixed_stride', or 'at_least'.")
+    # Iterate through the selected frame IDs
+    for frame_id in frame_ids:
+        # Set the video capture position to the desired frame
+        video.set(cv2.CAP_PROP_POS_FRAMES, frame_id)
+        success, frame = video.read()
+        if mask_video is not None:
+            mask_video.set(cv2.CAP_PROP_POS_FRAMES, frame_id)
+            success_mask, mask = mask_video.read()
+            if not success_mask:
+                print(f"Warning: Failed to read mask frame {frame_id} of {mask_path}. Skipping.")
+                continue
+            yield frame, frame_id, mask
+        else:
+            # Check if the frame was successfully read
+            if not success:
+                print(f"Warning: Failed to read frame {frame_id} of {source_path}. Skipping.")
+                continue
+            yield frame, frame_id, None
+    # Release the video capture object
+    video.release()
+    if mask_video is not None:
+        mask_video.release()
+def align_face(
+    img: np.ndarray,
+    landmarks: np.ndarray,
+    target_size: None | tuple = None,
+    scale: float = 1.3,
+    mask: np.ndarray = None,
+):
+    """
+    Aligns a face based on 5-point facial landmarks (eyes, nose, mouth corners).
+    Args:
+        img: Input image containing the face
+        landmarks: 5-point facial landmarks array with shape (5, 2)
+        target_size: Desired output size as (width, height) tuple
+        scale: Scaling factor to control how much context around the face to include
+        stabilize_features: Whether to use standard reference points for consistent alignment
+        return_transform: Whether to return the transformation matrix
+        mask: Resize mask the same way as img
+    Returns:
+        Aligned face image with specified target_size
+        Optionally returns the transformation matrix if return_transform=True
+    """
+    dst = np.array(
+        [
+            [0.34, 0.46],
+            [0.66, 0.46],
+            [0.5, 0.64],
+            [0.37, 0.82],
+            [0.63, 0.82],
+        ],
+        dtype=np.float32,
+    )
+    if target_size is None:
+        # Compute desired distances between all pairs
+        desired_dists = np.linalg.norm(landmarks[:, None, :] - landmarks[None, :, :], axis=-1)
+        # Destination distances between all pairs
+        dst_dists = np.linalg.norm(dst[:, None, :] - dst[None, :, :], axis=-1)
+        # Take upper triangle of the distance matrix
+        upper_triangle_indices = np.triu_indices(len(dst), k=1)
+        dst_dists = dst_dists[upper_triangle_indices]
+        desired_dists = desired_dists[upper_triangle_indices]
+        # Approximate target size
+        approx_size = np.round(np.mean(desired_dists / dst_dists) * scale).astype(int)
+        target_size = (approx_size, approx_size)
+    dst[:, 0] = dst[:, 0] * target_size[0]
+    dst[:, 1] = dst[:, 1] * target_size[1]
+    margin_rate = scale - 1
+    x_margin = target_size[0] * margin_rate / 2.0
+    y_margin = target_size[1] * margin_rate / 2.0
+    # move
+    dst[:, 0] += x_margin
+    dst[:, 1] += y_margin
+    # resize
+    dst[:, 0] *= target_size[0] / (target_size[0] + 2 * x_margin)
+    dst[:, 1] *= target_size[1] / (target_size[1] + 2 * y_margin)
+    src = landmarks.astype(np.float32)
+    M = cv2.estimateAffinePartial2D(src, dst, method=cv2.LMEDS)[0]
+    img = cv2.warpAffine(img, M, target_size, flags=cv2.INTER_LINEAR)
+    # Warp landmarks, show
+    # landmarks = cv2.transform(np.expand_dims(landmarks, axis=0), M)[0]
+    # for point in landmarks:
+    #     cv2.circle(img, tuple(point.astype(int)), 2, (0, 255, 0), -1)
+    if mask is not None:
+        mask = cv2.warpAffine(mask, M, target_size, flags=cv2.INTER_NEAREST)
+    return img, mask
+def process_video(
+    source_path,
+    target_path,
+    mask_path,
+    model: RetinaFace,
+    scale=1.3,
+    target_size=(256, 256),
+    stride=1,
+    num_frames=32,
+    mode="at_least",
+    skip_processed_videos=False,
+    skip_processed_frames=False,
+):
+    frame_save_path = target_path.replace(".mp4", "/frames")
+    # Skip if frame_save_path exists
+    if skip_processed_videos and os.path.exists(frame_save_path):
+        print(f"Frames for {source_path} already processed.")
+        return
+    else:
+        print(f"Processing {source_path}")
+    # Create a frame generator from video path for iteration of frames
+    frame_generator = get_video_frames_generator(
+        source_path,
+        mask_path,
+        stride=stride,
+        num_frames=num_frames,
+        mode=mode,
+    )
+    # desc = f"Processing {os.path.basename(source_path)}"
+    num_saved = 0
+    for frame, frame_id, mask in frame_generator:
+        frame_filename = os.path.join(frame_save_path, f"frame_{frame_id:04d}.png")
+        if skip_processed_frames and os.path.exists(frame_filename):
+            print(f"Frame {frame_id} of {source_path} already processed.")
+            num_saved += 1
+            if mode in ["fixed_stride", "at_least"] and num_saved >= num_frames and num_frames != -1:
+                break
+            continue
+        try:
+            preds = model.detect(frame)
+        except Exception as e:
+            print(f"Error during detection: {e}")
+            continue
+        xyxy, landmarks = preds
+        if len(xyxy) == 0:
+            print(f"No faces detected in frame {frame_id} of {source_path}")
+            continue
+        selected_landmarks = None
+        if mask is not None:
+            # It is possible that the mask is empty -> skip this frame
+            if mask.sum() == 0:
+                print(f"Warning: Mask is empty for frame {frame_id} of {source_path}. Skipping.")
+                continue
+            # Convert mask to grayscale if it's not already
+            mask_img = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY) if len(mask.shape) == 3 else mask
+            # Threshold the mask to create a binary mask
+            mask_img = cv2.threshold(mask_img, 1, 255, cv2.THRESH_BINARY)[1]
+            # Find the face that intersects the most with the mask
+            best_landmarks = None
+            max_intersection = 0
+            for i in range(len(xyxy)):
+                # Get the bounding box coordinates
+                x1, y1, x2, y2 = xyxy[i, :4].astype(int)
+                # Create a mask for the face
+                face_mask = np.zeros_like(mask_img)
+                face_mask[y1:y2, x1:x2] = 255
+                # Calculate the intersection between the face mask and the provided mask
+                intersection = np.sum(np.logical_and(face_mask, mask_img))
+                # Update the best face if the intersection is greater than the current maximum
+                if intersection > max_intersection:
+                    max_intersection = intersection
+                    best_landmarks = landmarks[i]
+            # If a face was found, use it; otherwise, skip this frame
+            if best_landmarks is not None:
+                selected_landmarks = best_landmarks
+            else:
+                print(f"No suitable face found in frame {frame_id} of {source_path} with the provided mask.")
+                continue
+        # """
+        # Select landmarks of the largest face if not using mask
+        if selected_landmarks is None:
+            areas = (xyxy[:, 2] - xyxy[:, 0]) * (xyxy[:, 3] - xyxy[:, 1])
+            idx = np.argmax(areas)
+            selected_landmarks = landmarks[idx]
+        # Show all landmarks
+        # for L, B in zip(landmarks, xyxy):
+        #     for point in L:
+        #         cv2.circle(frame, tuple(point.astype(int)), 2, (0, 255, 0), -1)
+        #     cv2.rectangle(frame, tuple(B[0:2].astype(int)), tuple(B[2:4].astype(int)), (0, 255, 0), 2)
+        # Align the face
+        aligned_face, _ = align_face(frame, selected_landmarks, target_size=target_size, scale=scale)
+        # Save the aligned face
+        os.makedirs(frame_save_path, exist_ok=True)
+        cv2.imwrite(frame_filename, aligned_face)
+        # """
+        num_saved += 1
+        if mode in ["fixed_stride", "at_least"] and num_saved >= num_frames and num_frames != -1:
+            break
+    if num_saved == 0:
+        print(f"No faces were saved from {source_path}. Check the detection threshold or input video.")
+    return frame_save_path
+def process_image(
+    source_path,
+    target_path,
+    model: RetinaFace,
+    scale=1.3,
+    target_size=(256, 256),
+    skip_processed_frames=False,
+):
+    """Processes a single image file."""
+    if skip_processed_frames and os.path.exists(target_path):
+        print(f"Image {source_path} already processed.")
+        return target_path
+    else:
+        print(f"Processing {source_path}")
+    img = cv2.imread(source_path)
+    if img is None:
+        print(f"Failed to read image {source_path}")
+        return None
+    try:
+        preds = model.detect(img)
+    except Exception as e:
+        print(f"Error during detection: {e}")
+        return None
+    xyxy, landmarks = preds
+    if len(xyxy) == 0:
+        print(f"No faces detected in {source_path}")
+        return None
+    # Select landmarks of the largest face
+    areas = (xyxy[:, 2] - xyxy[:, 0]) * (xyxy[:, 3] - xyxy[:, 1])
+    idx = np.argmax(areas)
+    landmarks = landmarks[idx]
+    # Align the face
+    aligned_face, _ = align_face(img, landmarks, target_size=target_size, scale=scale)
+    # Save the aligned face
+    os.makedirs(os.path.dirname(target_path), exist_ok=True)
+    cv2.imwrite(target_path, aligned_face)
+    return target_path
+def get_output_path(source_path, input_folder, output_folder):
+    # Example: source_path = input_folder + new_source_path``
+    new_source_path = source_path.replace(input_folder, os.path.basename(input_folder))
+    # Create directory for each video
+    new_source_path = new_source_path.replace(".mp4", "")
+    # Place it in the output folder
+    output_path = os.path.join(output_folder, new_source_path)
+    return output_path
+def get_mask_path(input_folder, input_mask_folder, source_path):
+    if input_mask_folder is not None:
+        # Change the input folder to the mask folder
+        source_path = source_path.replace(input_folder, input_mask_folder)
+        #! FF++ has masks named the same way as original videos
+        if "FaceForensics" in source_path or "FF++" in source_path:
+            return source_path
+        #! Else assume masks are named with _mask suffix
+        source_path = source_path.replace(".mp4", "_mask.mp4")
+        return source_path
+    return None
+def process_mixed_types(
+    input_folder_or_file: str | list[str],
+    input_mask_folder: None | str,
+    model: RetinaFace,
+    num_workers=1,
+    scale=1.3,
+    target_size=(256, 256),
+    stride=1,
+    num_frames=32,
+    mode: str = "fixed_num_frames",
+    output_folder: str = "outputs",
+    possible_extensions: tuple[str] = ("mp4", "jpg", "png", "jpeg"),
+    skip_processed_videos: bool = False,
+    skip_processed_frames: bool = False,
+):
+    if os.path.isfile(input_folder_or_file):
+        # If input is a file
+        if input_folder_or_file.endswith(possible_extensions):
+            # If input is a media file
+            files = [input_folder_or_file]
+        elif input_folder_or_file.endswith("txt"):
+            # If input is a txt file
+            with open(input_folder_or_file, "r") as f:
+                files = f.read().splitlines()
+    else:
+        # If input is a folder
+        files = find_files(input_folder_or_file, possible_extensions)
+    if not files:
+        print(f"No files found in {input_folder_or_file}")
+        return
+    def process(source_path):
+        output_path = get_output_path(source_path, input_folder_or_file, output_folder)
+        if source_path.endswith(".mp4"):
+            mask_path = get_mask_path(input_folder_or_file, input_mask_folder, source_path)
+            try:
+                return process_video(
+                    source_path,
+                    output_path,
+                    mask_path,
+                    model,
+                    scale=scale,
+                    target_size=target_size,
+                    stride=stride,
+                    num_frames=num_frames,
+                    mode=mode,
+                    skip_processed_videos=skip_processed_videos,
+                    skip_processed_frames=skip_processed_frames,
+                )
+            except Exception as e:
+                print(f"Error processing video {source_path}: {e}")
+        else:
+            try:
+                return process_image(
+                    source_path,
+                    output_path,
+                    model,
+                    scale=scale,
+                    target_size=target_size,
+                    skip_processed_frames=skip_processed_frames,
+                )
+            except Exception as e:
+                print(f"Error processing image {source_path}: {e}")
+    files = sorted(files)  # Sort files for consistent processing
+    with ThreadPoolExecutor(max_workers=num_workers) as executor:
+        futures = [executor.submit(process, file) for file in files]
+        for future in tqdm(futures, desc=f"Processing videos in {input_folder_or_file}", leave=True):
+            future.result()
+    print("Processing complete.")
+def find_files_fd(start_dir, extensions):
+    """
+    Finds files with given extensions recursively using the 'fd' command-line tool.
+    Args:
+        start_dir (str): The directory to start searching from.
+        extensions (list): A list of file extensions without the leading dot (e.g., ['png', 'jpg']).
+    Returns:
+        list: A list of full path strings for each found file. Returns empty list if fd fails.
+    Raises:
+        FileNotFoundError: If the 'fd' command is not found in the system's PATH.
+    """
+    if not os.path.isdir(start_dir):
+        print(f"Error: Start directory not found: {start_dir}")
+        return []
+    try:
+        # Build the command. Use -e for each extension.
+        command = ["fd", "--type", "f", "--type", "l"]  # Find only files or links to files
+        for ext in extensions:
+            # fd expects extensions without the dot
+            command.extend(["--extension", ext])
+        # Add the pattern ('.' matches everything, filtering is done by extension)
+        # and the directory to search
+        command.extend([".", start_dir])
+        # Run the command
+        result = subprocess.run(
+            command,
+            capture_output=True,  # Capture stdout and stderr
+            text=True,  # Decode output as text (UTF-8 by default)
+            check=False,  # Do not raise exception on non-zero exit code automatically
+            encoding="utf-8",  # Be explicit about encoding
+        )
+        # Check if fd ran successfully
+        if result.returncode != 0:
+            # fd returns specific exit codes, e.g., 1 for errors, 2 if pattern not found (but we use '.')
+            # We mainly care if the command executed but maybe found nothing or had an issue.
+            # Check stderr for actual errors.
+            if result.stderr:
+                print(f"Error running fd (code {result.returncode}): {result.stderr.strip()}")
+            # If stderr is empty but code isn't 0, it might just mean no files found, which is okay.
+            # We return an empty list in case of errors or no files found.
+            return []  # Return empty list on error or if no files found
+        # fd outputs one path per line. Split the output.
+        # .strip() removes potential leading/trailing whitespace/newlines
+        file_list = result.stdout.strip().splitlines()
+        return file_list
+    except FileNotFoundError:
+        raise  # Re-raise the exception so the caller knows fd is missing
+    except Exception as e:
+        print(f"An unexpected error occurred while running fd: {e}")
+        return []  # Return empty list on other unexpected errors
+def find_files_glob(start_dir, extensions):
+    """
+    Finds files with given extensions recursively using glob.
+    Args:
+        start_dir (str): The directory to start searching from.
+        extensions (list): A list of file extensions without the leading dot (e.g., ['png', 'jpg']).
+    Returns:
+        list: A list of full path strings for each found file.
+    """
+    files = []
+    for ext in extensions:
+        files.extend(glob(f"{start_dir}/**/*{ext}", recursive=True))
+    return sorted(f for f in files if os.path.isfile(f))
+def find_files(start_dir, extensions):
+    try:
+        return find_files_fd(start_dir, extensions)
+    except Exception:
+        return find_files_glob(start_dir, extensions)
+def get_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "-i",
+        "--input_folder_or_file",
+        type=str,
+        required=True,
+        help="Path to the input folder containing videos or images.",
+    )
+    parser.add_argument(
+        "--mask_folder",
+        type=str,
+        default=None,
+        help="Path to the input folder containing masks (optional).",
+    )
+    parser.add_argument(
+        "--num_workers",
+        type=int,
+        default=8,
+        help="Number of worker threads.",
+    )
+    parser.add_argument(
+        "-s",
+        "--scale",
+        type=float,
+        default=1.3,
+        help="Scale factor for face alignment.",
+    )
+    parser.add_argument(
+        "--target_size",
+        type=str,
+        default="256,256",
+        help="Target size for aligned faces as width, height (e.g., 256,256) or 'none'.",
+    )
+    parser.add_argument(
+        "--det_thres",
+        type=float,
+        default=0.4,
+        help="Detection threshold for RetinaFace.",
+    )
+    parser.add_argument(
+        "-m",
+        "--mode",
+        type=str,
+        default="at_least",
+        choices=["fixed_num_frames", "fixed_stride", "at_least"],
+        help="Mode for frame extraction from videos ('fixed_num_frames', 'fixed_stride', or 'at_least').",
+    )
+    parser.add_argument(
+        "--stride",
+        type=int,
+        default=1,
+        help="Stride for frame extraction from videos (only used in 'fixed_stride' mode).",
+    )
+    parser.add_argument(
+        "-n",
+        "--num_frames",
+        type=int,
+        default=32,
+        help="Maximum number of frames to extract from each video, -1 for all frames.",
+    )
+    parser.add_argument(
+        "-o",
+        "--output_folder",
+        type=str,
+        default="outputs",
+        help="Output folder for the preprocessed images.",
+    )
+    parser.add_argument(
+        "--skip_processed_videos",
+        action="store_true",
+        help="Skip videos that have already been processed.",
+    )
+    parser.add_argument(
+        "--skip_processed_frames",
+        action="store_true",
+        help="Skip frames that have already been processed.",
+    )
+    args = parser.parse_args()
+    args.target_size = parse_target_size(args.target_size)
+    return args
+def parse_target_size(target_size_str):
+    try:
+        width, height = map(int, target_size_str.split(","))
+        return (width, height)
+    except ValueError:
+        if "none" in target_size_str.lower():
+            return None
+        raise ValueError("Invalid target_size format. Use 'width,height' or 'none'.")
+def main():
+    args = get_args()
+    model = prepare_model(args.det_thres)
+    process_mixed_types(
+        input_folder_or_file=args.input_folder_or_file,
+        input_mask_folder=args.mask_folder,
+        model=model,
+        num_workers=args.num_workers,
+        scale=args.scale,
+        target_size=args.target_size,
+        stride=args.stride,
+        num_frames=args.num_frames,
+        mode=args.mode,
+        output_folder=args.output_folder,
+        skip_processed_videos=args.skip_processed_videos,
+        skip_processed_frames=args.skip_processed_frames,
+    )
+    exit(0)
+if __name__ == "__main__":
+    main()

pyproject.toml ADDED Viewed

	@@ -0,0 +1,37 @@

+[tool.ruff]
+line-length = 120
+[tool.ruff.lint]
+ignore = [
+    "C901", # complex condition
+    "E501", # line too long
+    "F401", # imported but unused
+    "F403", # from module import * used; unable to detect undefined names
+    "F405", # name may be undefined, or defined from star imports: module
+    "E741", # ambiguous variable name
+]
+select = [
+    "C", # flake8-comprehensions
+    "E", "W", # pycodestyle
+    "F", # pyflakes
+    "I", # isort
+]
+[tool.ruff.lint.isort]
+force-to-top = ["autoroot", "autorootcwd"]
+[tool.ruff.lint.per-file-ignores]
+"**/__init__.py" = ["E402"]
+[tool.pyright]
+exclude = [
+    "**/__pycache__",
+    "wandb",
+    "datasets",
+    "outputs",
+    "runs",
+    "tmp",
+    "logs",
+]
+typeCheckingMode = "off"

requirements.txt ADDED Viewed

	@@ -0,0 +1,34 @@

+torch==2.8.0
+torchaudio==2.8.0
+torchvision==0.23.0
+lightning==2.5.5
+transformers==4.56.2
+tqdm==4.67.1 # progress bar
+timm==1.0.20 # torch models
+matplotlib==3.10.6 # visualization
+seaborn==0.13.2  # visualization
+scikit-learn==1.6.1 # metrics
+rich==14.1.0 # logging
+wandb==0.22.0 # logging
+pydantic==2.11.9 # config
+# albumentations==1.4.17 # augmentations
+ruff==0.13.2 # formatting
+fire==0.7.0 # CLI
+pytorch-metric-learning==2.8.1 # losses
+peft==0.15.2 # parameter-efficient fine-tuning
+ipykernel==6.30.1 # jupyter
+autoroot==1.0.1 # root utils
+autorootcwd==1.0.1 # root utils
+xformers==0.0.32.post2 # RADIOv2.5/3
+einops==0.8.1 # RADIOv2.5/3
+open-clip-torch==2.32.0 # RADIOv2.5/3
+grad-cam==1.5.5 # for Grad-CAM visualization
+mediapipe==0.10.21 # Face landmark detection
+# --- for detector.py ---
+# opencv-python==4.11.0.86 # mainly only for detector.py
+opencv-python-headless==4.12.0.88 # mainly for `detector.py`
+onnxruntime-gpu==1.21.0 # for ONNX model inference
+# --- for app/run.py ---
+gradio==5.49.1

run.py ADDED Viewed

	@@ -0,0 +1,174 @@

+import os
+import traceback
+import torch
+from lightning import Trainer
+from lightning.pytorch import callbacks as pl_callbacks
+from lightning.pytorch import loggers as pl_loggers
+from rich import traceback as rich_traceback
+from src import dataset as datasets
+from src.config import Config
+from src.model.base import BaseDeepakeDetectionModel
+from src.utils import logger
+from src.utils.checks import checks
+from src.utils.model_checkpoint import ModelCheckpointParallel
+rich_traceback.install()
+def load_third_party_model(config: Config) -> BaseDeepakeDetectionModel:
+    if "weights/Effort" in config.checkpoint:
+        # Download: https://drive.google.com/drive/folders/19kQwGDjF18uk78EnnypxxOLaG4Aa4v1h
+        from src.model.Effort import Effort
+        return Effort(config)
+    if "weights/ForAda" in config.checkpoint:
+        # Download: https://drive.usercontent.google.com/download?id=1UlaAUTtsX87ofIibf38TtfAKIsnA7WVm&export=download&authuser=0
+        from src.model.ForAda import ForAda
+        return ForAda(config)
+    if "weights/FS-VFM/" in config.checkpoint:
+        from src.model.FSFM import FSFM
+        return FSFM(config)
+    if "yermandy/" in config.checkpoint:
+        # https://huggingface.co/yermandy/models
+        from src.model.GenDHF import GenDHF
+        return GenDHF(config)
+    raise ValueError(f"Unknown third party model in checkpoint path: {config.checkpoint}")
+def load_model(config: Config) -> BaseDeepakeDetectionModel:
+    # If no checkpoint is provided, use GenD as default
+    if config.checkpoint is None or config.checkpoint == "":
+        from src.model.GenD import GenD
+        return GenD(config, verbose=True)
+    # Try to load third party model
+    try:
+        return load_third_party_model(config)
+    except ValueError:
+        # If not a third party model, use GenD as default
+        from src.model.GenD import GenD
+        return GenD(config, verbose=True)
+def init_loggers(config: Config) -> list:
+    save_dir = f"{config.run_dir}/{config.run_name}"
+    loggers: list = [pl_loggers.CSVLogger(config.run_dir, name=config.run_name, version="")]
+    if config.wandb:
+        wandb_logger = pl_loggers.WandbLogger(
+            project="deepfake",
+            name=config.run_name,
+            save_dir=save_dir,
+            tags=set(config.wandb_tags),
+            group=config.wandb_group,
+        )
+        loggers.append(wandb_logger)
+    return loggers
+def init_callbacks(config: Config) -> list:
+    callbacks = [
+        pl_callbacks.RichProgressBar(leave=True),
+        ModelCheckpointParallel(
+            filename=config.checkpoint_name, monitor=config.monitor_metric, mode=config.monitor_metric_mode
+        ),
+    ]
+    # pl_callbacks.LearningRateFinder(1e-5, 1e-2),
+    if config.early_stopping_patience > 0:
+        callbacks.append(
+            pl_callbacks.EarlyStopping(
+                monitor=config.monitor_metric,
+                patience=config.early_stopping_patience,
+                mode=config.monitor_metric_mode,
+                verbose=True,
+            )
+        )
+    return callbacks
+def finish_wandb_run(trainer, config: Config):
+    if config.wandb:
+        if any(isinstance(l, pl_loggers.WandbLogger) for l in trainer.loggers):
+            wandb_logger = [l for l in trainer.loggers if isinstance(l, pl_loggers.WandbLogger)][0]
+            wandb_logger.finalize("success")
+            wandb_logger.experiment.finish()
+def main(config: Config, train: bool):
+    # Performs initial checks
+    checks(config)
+    # Set the precision for matmul operations
+    torch.set_float32_matmul_precision("high")
+    # Instantiates the model
+    model = load_model(config)
+    # Loads the checkpoint if provided
+    model.load_checkpoint(config.checkpoint)
+    data_module = datasets.DeepfakeDataModule(config, model.get_preprocessing())
+    save_dir = f"{config.run_dir}/{config.run_name}"
+    trainer = Trainer(
+        devices=config.devices,
+        max_epochs=config.max_epochs,
+        precision=config.precision,
+        accumulate_grad_batches=config.batch_size // config.mini_batch_size,
+        fast_dev_run=config.fast_dev_run,
+        log_every_n_steps=100,
+        overfit_batches=config.overfit_batches,
+        limit_train_batches=config.limit_train_batches,
+        limit_val_batches=config.limit_val_batches,
+        limit_test_batches=config.limit_test_batches,
+        deterministic=config.deterministic,
+        detect_anomaly=config.detect_anomaly,
+        logger=init_loggers(config),
+        callbacks=init_callbacks(config),
+        default_root_dir=config.run_dir,
+    )
+    if train:
+        try:
+            trainer.fit(model, data_module)
+        except KeyboardInterrupt:
+            logger.print_warning("Training interrupted")
+        except Exception as e:
+            traceback.print_exc()  # Print complete exception traceback
+            logger.print_error(f"Training failed: {e}")
+            # Save the exception traceback to a file
+            with open(f"{save_dir}/failed.log", "a") as f:
+                f.write(f"Training failed: {e}\n")
+                f.write(traceback.format_exc())
+        finally:
+            logger.print_info("Training finished. Starting testing")
+            ckpt_path = f"{save_dir}/checkpoints/{config.checkpoint_name}.ckpt"
+            if not os.path.exists(ckpt_path):
+                logger.print_error(f"Checkpoint {ckpt_path} does not exist. Cannot proceed with testing.")
+            else:
+                model.load_checkpoint(ckpt_path)
+                trainer.test(model, data_module)
+    else:
+        assert config.checkpoint is not None, "Checkpoint is required for testing"
+        trainer.test(model, data_module)
+    # Finish wandb run
+    finish_wandb_run(trainer, config)

run_exp.py ADDED Viewed

	@@ -0,0 +1,209 @@

+import traceback
+from copy import deepcopy
+import fire
+from run import main
+from src import config as C
+from src.config import Config
+from src.exp import experiments
+from src.utils import files, logger
+def get_val_files():
+    return [
+        *files.DeepSpeak_v2.my_val,
+        *files.DeepSpeak_v1_1.my_val,
+        *files.CDFv2.val,
+        *files.FFIW.val,
+    ]
+def get_test_files():
+    return {
+        "FF": files.FF.test,
+        "FF-DF": files.FF.DF.test,
+        "FF-F2F": files.FF.F2F.test,
+        "FF-FS": files.FF.FS.test,
+        "FF-NT": files.FF.NT.test,
+        "CDF": files.CDFv2.test,
+        "FaceFusion": files.FaceFusion.CDF.test,
+        "DFD": files.DFD.test,
+        "DFDC": files.DFDC.test,
+        "FSh": files.FSh.test,
+        "UADFD": files.UADFV.test,
+        "DFDM": files.DFDM.test,
+        "FFIW": files.FFIW.test,
+        "DeepSpeak-1.1": files.DeepSpeak_v1_1.test,
+        "DeepSpeak-2.0": files.DeepSpeak_v2.test,
+        "KoDF": files.KoDF.test,
+        "KoDF-adv": files.KoDF.adversarial,
+        "FakeAVCeleb": files.FakeAVCeleb.test,
+        "FAVC-FV-RA-WL": files.FakeAVCeleb.FV_RA_WL.test,
+        "FAVC-FV-FA-FS": files.FakeAVCeleb.FV_FA_FS.test,
+        "FAVC-FV-FA-GAN": files.FakeAVCeleb.FV_FA_GAN.test,
+        "FAVC-FV-FA-WL": files.FakeAVCeleb.FV_FA_WL.test,
+        "PolyGlotFake": files.PolyGlotFake.test,
+        "IDForge-v1": files.IDForge_v1.test,
+    } | {
+        k: v.map(lambda x: x.replace("/CDFv3/", "/CDFv3-x1.3-th0.5-all/subset/uniform-32-frames/"))
+        for k, v in files.CDFv3.get_test_dict().items()
+    }
+def get_default_train_config() -> Config:
+    config = Config()
+    config.run_dir = "runs/rebuttal"
+    config.wandb = True
+    config.wandb_tags.append("rebuttal")
+    config.throw_exception_if_run_exists = True
+    config.num_workers = 12
+    config.devices = "auto"
+    config.backbone = C.Backbone.CLIP_L_14
+    config.freeze_feature_extractor = True
+    config.num_classes = 2
+    config.batch_size = config.mini_batch_size = 128
+    config.lr_scheduler = "cosine"
+    config.lr = 3e-4
+    config.min_lr = 1e-5
+    config.weight_decay = 0
+    config.max_epochs = 1 + 50
+    config.warmup_epochs = 1
+    config.trn_files = files.FF.train
+    config.val_files = get_val_files()
+    config.tst_files = get_test_files()
+    return config
+def get_default_test_config(orig_run_name, new_run_name) -> Config:
+    orig_run_dir = files.find_run_dir(orig_run_name)
+    orig_config_path = f"{orig_run_dir}/hparams.yaml"
+    checkpoint = "best_mAP.ckpt"  # Default checkpoint name
+    # Load run specific config
+    config = C.load_config(orig_config_path)
+    config.run_name = new_run_name  # Rename the run
+    config.run_dir = "runs/test"  # Set default test dir
+    config.checkpoint = f"{orig_run_dir}/checkpoints/{checkpoint}"
+    config.wandb = True
+    config.wandb_tags.extend(["test"])
+    config.num_workers = 12
+    config.batch_size = config.mini_batch_size = 1024
+    config.devices = "auto"
+    config.tst_files = get_test_files()
+    return config
+def get_debug_config(config: Config) -> Config:
+    #! Debug
+    config.run_dir = "runs/tmp"
+    config.run_name = "tmp"
+    # config.num_workers = 0
+    config.max_epochs = 1
+    config.limit_train_batches = 12
+    config.limit_val_batches = 12
+    config.limit_test_batches = 12
+    # config.batch_size = config.mini_batch_size = 2
+    # config.deterministic = True
+    # config.detect_anomaly = True
+    config.trn_files = files.FF.train
+    config.val_files = files.FF.val
+    config.tst_files = files.FF.val
+    return config
+experiments = {
+    **experiments,  # Include all experiments defined in src.exp
+}
+def entry(
+    exp_names: str | list[str],
+    debug: bool = False,
+    test: bool = False,
+    from_exp: str | None = None,
+    **kwargs,
+):
+    if test:
+        if from_exp is not None:
+            if isinstance(exp_names, list):
+                if len(exp_names) != 1:
+                    raise Exception("When running in test mode, you can provide only one experiment name.")
+            config = get_default_test_config(from_exp, exp_names[0])
+        else:
+            logger.print_warning("Running in test mode, but 'from_exp' is not provided. Using default test config.")
+            config = C.Config()
+    else:
+        config = get_default_train_config()
+    # parse name to list
+    if isinstance(exp_names, str):
+        exp_names = [exp_names]
+    for exp_name in exp_names:
+        exp_name = exp_name.strip()
+        if exp_name not in experiments:
+            logger.print_error(f"Experiment '{exp_name}' is not defined in 'src/exp/__init__.py:1'")
+            logger.print(f"Available experiments: {list(experiments.keys())}")
+            continue
+        modifiers = experiments[exp_name]
+        config_exp = deepcopy(config)
+        config_exp.run_name = exp_name
+        for modify in modifiers:
+            if isinstance(modify, Config):
+                # If the modifier is a Config object, change only different values
+                difference = modify.model_dump(exclude_unset=True)
+                # TODO: maybe set_values_from_dict(difference)?
+                config_exp = Config(**config_exp.model_copy(update=difference).model_dump())
+                # config_exp = config_exp.model_copy(update=difference)
+            else:
+                config_exp = modify(config_exp)
+        config_exp = Config(**config_exp.model_dump())  # Parse and validate config
+        if debug:
+            config_exp = config_exp.model_copy(update=get_debug_config(config_exp).model_dump())
+        # Update config with kwargs
+        config_exp.set_values_from_dict(kwargs)
+        # Revalidate the config - checks if user provided valid values
+        config_exp = Config(**config_exp.model_dump())
+        # logger.print(config_exp)
+        # exit()
+        try:
+            main(config_exp, not test)
+        except Exception as e:
+            traceback.print_exc()  # Print complete exception traceback
+            logger.print_error(f"Error occurred while running experiment '{exp_name}':")
+            logger.print(e)
+            save_dir = f"{config_exp.run_dir}/{config_exp.run_name}"
+            # Save the exception traceback to a file
+            with open(f"{save_dir}/failed.log", "a") as f:
+                f.write(f"\nTraining failed: {e}\n")
+                f.write(traceback.format_exc())
+if __name__ == "__main__":
+    fire.Fire(entry)