YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
DFG - Deepfake Genome Codebase
1. Environment Setup
Create and activate the conda environment:
# Create a new conda environment (Python 3.10 recommended)
conda create -n dfg python=3.10 -y
# Activate the environment
conda activate dfg
# Install dependencies
pip install -r requirements.txt
2. Dataset Configuration
Before training or testing, you need to update the dataset global path to match your actual data location.
Open training/dataset/abstract_dataset.py and modify the DATASET_GLOBAL_PATH variable:
# Change this to your actual dataset root path
DATASET_GLOBAL_PATH = "/your/actual/dataset/path/"
This path should point to the root directory containing your deepfake detection datasets (e.g., DeepFakeGenome, deepfake_detecton_dataset, etc.).
3. Project and Dataset Structure
DFG/
βββ preprocessing/
β βββ dataset_json/ # Dataset index JSON files
β βββ protocol_2_train.json
β βββ protocol_2_test.json
β βββ protocol_3_test.json
β βββ protocol_4_test.json
β βββ ...
βββ training/
β βββ config/
β β βββ detector/ # Detector config YAML files
β βββ detectors/ # Detector implementations
β β βββ __init__.py # Register all detectors here
β β βββ base_detector.py
β β βββ ...
β βββ networks/ # Backbone network implementations
β βββ loss/ # Loss function definitions
β βββ metrics/ # Evaluation metrics
β βββ train.py # Training entry point
β βββ test_pall.py # Testing entry point
βββ train.sh # Training script examples
βββ test.sh # Testing script examples
βββ requirements.txt # Python dependencies
βββ README.md
4. Training
Refer to train.sh for all training commands. Example:
python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \
--detector_path ./training/config/detector/clip_large_fft.yaml \
--no-save_feat --ddp
Key arguments:
--master_port: port for distributed training (change if port conflicts occur)--nproc_per_node: number of GPUs--detector_path: path to the detector config YAML--no-save_feat: disable feature saving during training--ddp: enable DistributedDataParallel
5. Testing
Refer to test.sh for all testing commands. Example:
# Test on protocol 2 & 3
python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \
--test_dataset "protocol_2_test" "protocol_3_test" \
--detector_path ./training/config/detector/clip_large_fft.yaml \
--weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51
# Test on protocol 4
python -m torch.distributed.launch --master_port=29512 --nproc_per_node=8 training/test_pall.py --ddp \
--test_dataset "protocol_4_test" \
--detector_path ./training/config/detector/clip_large_fft.yaml \
--weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51 \
--test_config test_config_p4.yaml
Key arguments:
--test_dataset: one or more dataset names (must match JSON filenames underpreprocessing/dataset_json/)--weights_path: path to trained model checkpoint directory--test_config: additional test configuration (required for protocol 4)
6. Adding a Custom Detector
To integrate your own detector into the framework, follow these three steps:
Step 1: Create the detector config YAML
Create a new file under training/config/detector/, e.g., my_detector.yaml:
# log dir
log_dir: logs/my_detector
# model setting
pretrained: null
model_name: my_detector
backbone_name: resnet34
# backbone setting
backbone_config:
mode: original
num_classes: 2
inc: 3
dropout: false
# dataset
all_dataset: [FaceForensics++, FF-F2F, FF-DF, FF-FS, FF-NT, FaceShifter, DeepFakeDetection, Celeb-DF-v1, Celeb-DF-v2, DFDCP, DFDC, DeeperForensics-1.0, UADFV]
train_dataset: [protocol_2_train]
test_dataset: [protocol_2_test]
compression: c23
train_batchSize: 64
test_batchSize: 64
workers: 8
frame_num: {'train': 16, 'test': 16}
resolution: 224
with_mask: false
with_landmark: false
# data augmentation
use_data_augmentation: false
data_aug:
flip_prob: 0.5
rotate_prob: 0.5
rotate_limit: [-10, 10]
blur_prob: 0.5
blur_limit: [3, 7]
brightness_prob: 0.5
brightness_limit: [-0.1, 0.1]
contrast_limit: [-0.1, 0.1]
quality_lower: 40
quality_upper: 100
# mean and std for normalization
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
# optimizer config
optimizer:
type: adam
adam:
lr: 0.0002
beta1: 0.9
beta2: 0.999
eps: 0.00000001
weight_decay: 0.0005
amsgrad: false
# training config
lr_scheduler: null
nEpochs: 20
start_epoch: 0
save_epoch: 1
rec_iter: 100
logdir: ./logs
manualSeed: 1024
save_ckpt: true
save_feat: true
# loss function
loss_func: cross_entropy
losstype: null
# metric
metric_scoring: auc
# cuda
ngpu: 1
cuda: true
cudnn: true
save_avg: true
save_latest_ckpt: true
Step 2: Create the detector Python file
Create training/detectors/my_detector.py:
import torch
import torch.nn as nn
from metrics.base_metrics_class import calculate_metrics_for_train
from .base_detector import AbstractDetector
from detectors import DETECTOR
from networks import BACKBONE
from loss import LOSSFUNC
@DETECTOR.register_module(module_name='my_detector')
class MyDetector(AbstractDetector):
def __init__(self, config):
super().__init__()
self.config = config
self.backbone = self.build_backbone(config)
self.loss_func = LOSSFUNC[config['loss_func']]()
def build_backbone(self, config):
backbone = BACKBONE[config['backbone_name']](config['backbone_config'])
return backbone
def features(self, data_dict: dict) -> torch.Tensor:
return self.backbone(data_dict['image'])
def classifier(self, features: torch.Tensor) -> torch.Tensor:
return self.fc(features)
def get_losses(self, data_dict: dict, pred_dict: dict) -> dict:
label = data_dict['label']
pred = pred_dict['cls']
loss = self.loss_func(pred, label)
return {'overall': loss}
def get_train_metrics(self, data_dict: dict, pred_dict: dict) -> dict:
label = data_dict['label']
pred = pred_dict['cls']
auc, eer, acc, ap = calculate_metrics_for_train(label.detach(), pred.detach())
return {'acc': acc, 'auc': auc, 'eer': eer, 'ap': ap}
def forward(self, data_dict: dict, inference=False) -> dict:
features = self.features(data_dict)
pred = self.classifier(features)
prob = torch.softmax(pred, dim=1)[:, 1]
pred_dict = {'cls': pred, 'prob': prob, 'feat': features}
return pred_dict
Step 3: Register the detector in __init__.py
Add the following import line to training/detectors/__init__.py:
from .my_detector import MyDetector
That's it! Now you can train and test with your custom detector:
# Train
python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \
--detector_path ./training/config/detector/my_detector.yaml \
--no-save_feat --ddp
# Test
python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \
--test_dataset "protocol_2_test" "protocol_3_test" \
--detector_path ./training/config/detector/my_detector.yaml \
--weights_path logs/my_detector/<your_checkpoint_folder>