# DFG - Deepfake Genome Codebase ## 1. Environment Setup Create and activate the conda environment: ```bash # Create a new conda environment (Python 3.10 recommended) conda create -n dfg python=3.10 -y # Activate the environment conda activate dfg # Install dependencies pip install -r requirements.txt ``` ## 2. Dataset Configuration Before training or testing, you need to update the **dataset global path** to match your actual data location. Open `training/dataset/abstract_dataset.py` and modify the `DATASET_GLOBAL_PATH` variable: ```python # Change this to your actual dataset root path DATASET_GLOBAL_PATH = "/your/actual/dataset/path/" ``` This path should point to the root directory containing your deepfake detection datasets (e.g., `DeepFakeGenome`, `deepfake_detecton_dataset`, etc.). ## 3. Project and Dataset Structure ``` DFG/ ├── preprocessing/ │ └── dataset_json/ # Dataset index JSON files │ ├── protocol_2_train.json │ ├── protocol_2_test.json │ ├── protocol_3_test.json │ ├── protocol_4_test.json │ └── ... ├── training/ │ ├── config/ │ │ └── detector/ # Detector config YAML files │ ├── detectors/ # Detector implementations │ │ ├── __init__.py # Register all detectors here │ │ ├── base_detector.py │ │ └── ... │ ├── networks/ # Backbone network implementations │ ├── loss/ # Loss function definitions │ ├── metrics/ # Evaluation metrics │ ├── train.py # Training entry point │ └── test_pall.py # Testing entry point ├── train.sh # Training script examples ├── test.sh # Testing script examples ├── requirements.txt # Python dependencies └── README.md ``` ## 4. Training Refer to `train.sh` for all training commands. Example: ```bash python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \ --detector_path ./training/config/detector/clip_large_fft.yaml \ --no-save_feat --ddp ``` Key arguments: - `--master_port`: port for distributed training (change if port conflicts occur) - `--nproc_per_node`: number of GPUs - `--detector_path`: path to the detector config YAML - `--no-save_feat`: disable feature saving during training - `--ddp`: enable DistributedDataParallel ## 5. Testing Refer to `test.sh` for all testing commands. Example: ```bash # Test on protocol 2 & 3 python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \ --test_dataset "protocol_2_test" "protocol_3_test" \ --detector_path ./training/config/detector/clip_large_fft.yaml \ --weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51 # Test on protocol 4 python -m torch.distributed.launch --master_port=29512 --nproc_per_node=8 training/test_pall.py --ddp \ --test_dataset "protocol_4_test" \ --detector_path ./training/config/detector/clip_large_fft.yaml \ --weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51 \ --test_config test_config_p4.yaml ``` Key arguments: - `--test_dataset`: one or more dataset names (must match JSON filenames under `preprocessing/dataset_json/`) - `--weights_path`: path to trained model checkpoint directory - `--test_config`: additional test configuration (required for protocol 4) ## 6. Adding a Custom Detector To integrate your own detector into the framework, follow these three steps: ### Step 1: Create the detector config YAML Create a new file under `training/config/detector/`, e.g., `my_detector.yaml`: ```yaml # log dir log_dir: logs/my_detector # model setting pretrained: null model_name: my_detector backbone_name: resnet34 # backbone setting backbone_config: mode: original num_classes: 2 inc: 3 dropout: false # dataset all_dataset: [FaceForensics++, FF-F2F, FF-DF, FF-FS, FF-NT, FaceShifter, DeepFakeDetection, Celeb-DF-v1, Celeb-DF-v2, DFDCP, DFDC, DeeperForensics-1.0, UADFV] train_dataset: [protocol_2_train] test_dataset: [protocol_2_test] compression: c23 train_batchSize: 64 test_batchSize: 64 workers: 8 frame_num: {'train': 16, 'test': 16} resolution: 224 with_mask: false with_landmark: false # data augmentation use_data_augmentation: false data_aug: flip_prob: 0.5 rotate_prob: 0.5 rotate_limit: [-10, 10] blur_prob: 0.5 blur_limit: [3, 7] brightness_prob: 0.5 brightness_limit: [-0.1, 0.1] contrast_limit: [-0.1, 0.1] quality_lower: 40 quality_upper: 100 # mean and std for normalization mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] # optimizer config optimizer: type: adam adam: lr: 0.0002 beta1: 0.9 beta2: 0.999 eps: 0.00000001 weight_decay: 0.0005 amsgrad: false # training config lr_scheduler: null nEpochs: 20 start_epoch: 0 save_epoch: 1 rec_iter: 100 logdir: ./logs manualSeed: 1024 save_ckpt: true save_feat: true # loss function loss_func: cross_entropy losstype: null # metric metric_scoring: auc # cuda ngpu: 1 cuda: true cudnn: true save_avg: true save_latest_ckpt: true ``` ### Step 2: Create the detector Python file Create `training/detectors/my_detector.py`: ```python import torch import torch.nn as nn from metrics.base_metrics_class import calculate_metrics_for_train from .base_detector import AbstractDetector from detectors import DETECTOR from networks import BACKBONE from loss import LOSSFUNC @DETECTOR.register_module(module_name='my_detector') class MyDetector(AbstractDetector): def __init__(self, config): super().__init__() self.config = config self.backbone = self.build_backbone(config) self.loss_func = LOSSFUNC[config['loss_func']]() def build_backbone(self, config): backbone = BACKBONE[config['backbone_name']](config['backbone_config']) return backbone def features(self, data_dict: dict) -> torch.Tensor: return self.backbone(data_dict['image']) def classifier(self, features: torch.Tensor) -> torch.Tensor: return self.fc(features) def get_losses(self, data_dict: dict, pred_dict: dict) -> dict: label = data_dict['label'] pred = pred_dict['cls'] loss = self.loss_func(pred, label) return {'overall': loss} def get_train_metrics(self, data_dict: dict, pred_dict: dict) -> dict: label = data_dict['label'] pred = pred_dict['cls'] auc, eer, acc, ap = calculate_metrics_for_train(label.detach(), pred.detach()) return {'acc': acc, 'auc': auc, 'eer': eer, 'ap': ap} def forward(self, data_dict: dict, inference=False) -> dict: features = self.features(data_dict) pred = self.classifier(features) prob = torch.softmax(pred, dim=1)[:, 1] pred_dict = {'cls': pred, 'prob': prob, 'feat': features} return pred_dict ``` ### Step 3: Register the detector in `__init__.py` Add the following import line to `training/detectors/__init__.py`: ```python from .my_detector import MyDetector ``` That's it! Now you can train and test with your custom detector: ```bash # Train python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \ --detector_path ./training/config/detector/my_detector.yaml \ --no-save_feat --ddp # Test python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \ --test_dataset "protocol_2_test" "protocol_3_test" \ --detector_path ./training/config/detector/my_detector.yaml \ --weights_path logs/my_detector/ ```