| # DFG - Deepfake Genome Codebase |
|
|
| ## 1. Environment Setup |
|
|
| Create and activate the conda environment: |
|
|
| ```bash |
| # Create a new conda environment (Python 3.10 recommended) |
| conda create -n dfg python=3.10 -y |
| |
| # Activate the environment |
| conda activate dfg |
| |
| # Install dependencies |
| pip install -r requirements.txt |
| ``` |
|
|
| ## 2. Dataset Configuration |
|
|
| Before training or testing, you need to update the **dataset global path** to match your actual data location. |
|
|
| Open `training/dataset/abstract_dataset.py` and modify the `DATASET_GLOBAL_PATH` variable: |
|
|
| ```python |
| # Change this to your actual dataset root path |
| DATASET_GLOBAL_PATH = "/your/actual/dataset/path/" |
| ``` |
|
|
| This path should point to the root directory containing your deepfake detection datasets (e.g., `DeepFakeGenome`, `deepfake_detecton_dataset`, etc.). |
|
|
| ## 3. Project and Dataset Structure |
|
|
| ``` |
| DFG/ |
| ├── preprocessing/ |
| │ └── dataset_json/ # Dataset index JSON files |
| │ ├── protocol_2_train.json |
| │ ├── protocol_2_test.json |
| │ ├── protocol_3_test.json |
| │ ├── protocol_4_test.json |
| │ └── ... |
| ├── training/ |
| │ ├── config/ |
| │ │ └── detector/ # Detector config YAML files |
| │ ├── detectors/ # Detector implementations |
| │ │ ├── __init__.py # Register all detectors here |
| │ │ ├── base_detector.py |
| │ │ └── ... |
| │ ├── networks/ # Backbone network implementations |
| │ ├── loss/ # Loss function definitions |
| │ ├── metrics/ # Evaluation metrics |
| │ ├── train.py # Training entry point |
| │ └── test_pall.py # Testing entry point |
| ├── train.sh # Training script examples |
| ├── test.sh # Testing script examples |
| ├── requirements.txt # Python dependencies |
| └── README.md |
| ``` |
|
|
| ## 4. Training |
|
|
| Refer to `train.sh` for all training commands. Example: |
|
|
| ```bash |
| python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \ |
| --detector_path ./training/config/detector/clip_large_fft.yaml \ |
| --no-save_feat --ddp |
| ``` |
|
|
| Key arguments: |
| - `--master_port`: port for distributed training (change if port conflicts occur) |
| - `--nproc_per_node`: number of GPUs |
| - `--detector_path`: path to the detector config YAML |
| - `--no-save_feat`: disable feature saving during training |
| - `--ddp`: enable DistributedDataParallel |
|
|
| ## 5. Testing |
|
|
| Refer to `test.sh` for all testing commands. Example: |
|
|
| ```bash |
| # Test on protocol 2 & 3 |
| python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \ |
| --test_dataset "protocol_2_test" "protocol_3_test" \ |
| --detector_path ./training/config/detector/clip_large_fft.yaml \ |
| --weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51 |
| |
| # Test on protocol 4 |
| python -m torch.distributed.launch --master_port=29512 --nproc_per_node=8 training/test_pall.py --ddp \ |
| --test_dataset "protocol_4_test" \ |
| --detector_path ./training/config/detector/clip_large_fft.yaml \ |
| --weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51 \ |
| --test_config test_config_p4.yaml |
| ``` |
|
|
| Key arguments: |
| - `--test_dataset`: one or more dataset names (must match JSON filenames under `preprocessing/dataset_json/`) |
| - `--weights_path`: path to trained model checkpoint directory |
| - `--test_config`: additional test configuration (required for protocol 4) |
|
|
| ## 6. Adding a Custom Detector |
|
|
| To integrate your own detector into the framework, follow these three steps: |
|
|
| ### Step 1: Create the detector config YAML |
|
|
| Create a new file under `training/config/detector/`, e.g., `my_detector.yaml`: |
|
|
| ```yaml |
| # log dir |
| log_dir: logs/my_detector |
| |
| # model setting |
| pretrained: null |
| model_name: my_detector |
| backbone_name: resnet34 |
| |
| # backbone setting |
| backbone_config: |
| mode: original |
| num_classes: 2 |
| inc: 3 |
| dropout: false |
| |
| # dataset |
| all_dataset: [FaceForensics++, FF-F2F, FF-DF, FF-FS, FF-NT, FaceShifter, DeepFakeDetection, Celeb-DF-v1, Celeb-DF-v2, DFDCP, DFDC, DeeperForensics-1.0, UADFV] |
| train_dataset: [protocol_2_train] |
| test_dataset: [protocol_2_test] |
| |
| compression: c23 |
| train_batchSize: 64 |
| test_batchSize: 64 |
| workers: 8 |
| frame_num: {'train': 16, 'test': 16} |
| resolution: 224 |
| with_mask: false |
| with_landmark: false |
| |
| # data augmentation |
| use_data_augmentation: false |
| data_aug: |
| flip_prob: 0.5 |
| rotate_prob: 0.5 |
| rotate_limit: [-10, 10] |
| blur_prob: 0.5 |
| blur_limit: [3, 7] |
| brightness_prob: 0.5 |
| brightness_limit: [-0.1, 0.1] |
| contrast_limit: [-0.1, 0.1] |
| quality_lower: 40 |
| quality_upper: 100 |
| |
| # mean and std for normalization |
| mean: [0.485, 0.456, 0.406] |
| std: [0.229, 0.224, 0.225] |
| |
| # optimizer config |
| optimizer: |
| type: adam |
| adam: |
| lr: 0.0002 |
| beta1: 0.9 |
| beta2: 0.999 |
| eps: 0.00000001 |
| weight_decay: 0.0005 |
| amsgrad: false |
| |
| # training config |
| lr_scheduler: null |
| nEpochs: 20 |
| start_epoch: 0 |
| save_epoch: 1 |
| rec_iter: 100 |
| logdir: ./logs |
| manualSeed: 1024 |
| save_ckpt: true |
| save_feat: true |
| |
| # loss function |
| loss_func: cross_entropy |
| losstype: null |
| |
| # metric |
| metric_scoring: auc |
| |
| # cuda |
| ngpu: 1 |
| cuda: true |
| cudnn: true |
| |
| save_avg: true |
| save_latest_ckpt: true |
| ``` |
|
|
| ### Step 2: Create the detector Python file |
|
|
| Create `training/detectors/my_detector.py`: |
|
|
| ```python |
| import torch |
| import torch.nn as nn |
| |
| from metrics.base_metrics_class import calculate_metrics_for_train |
| from .base_detector import AbstractDetector |
| from detectors import DETECTOR |
| from networks import BACKBONE |
| from loss import LOSSFUNC |
| |
| |
| @DETECTOR.register_module(module_name='my_detector') |
| class MyDetector(AbstractDetector): |
| def __init__(self, config): |
| super().__init__() |
| self.config = config |
| self.backbone = self.build_backbone(config) |
| self.loss_func = LOSSFUNC[config['loss_func']]() |
| |
| def build_backbone(self, config): |
| backbone = BACKBONE[config['backbone_name']](config['backbone_config']) |
| return backbone |
| |
| def features(self, data_dict: dict) -> torch.Tensor: |
| return self.backbone(data_dict['image']) |
| |
| def classifier(self, features: torch.Tensor) -> torch.Tensor: |
| return self.fc(features) |
| |
| def get_losses(self, data_dict: dict, pred_dict: dict) -> dict: |
| label = data_dict['label'] |
| pred = pred_dict['cls'] |
| loss = self.loss_func(pred, label) |
| return {'overall': loss} |
| |
| def get_train_metrics(self, data_dict: dict, pred_dict: dict) -> dict: |
| label = data_dict['label'] |
| pred = pred_dict['cls'] |
| auc, eer, acc, ap = calculate_metrics_for_train(label.detach(), pred.detach()) |
| return {'acc': acc, 'auc': auc, 'eer': eer, 'ap': ap} |
| |
| def forward(self, data_dict: dict, inference=False) -> dict: |
| features = self.features(data_dict) |
| pred = self.classifier(features) |
| prob = torch.softmax(pred, dim=1)[:, 1] |
| pred_dict = {'cls': pred, 'prob': prob, 'feat': features} |
| return pred_dict |
| ``` |
|
|
| ### Step 3: Register the detector in `__init__.py` |
|
|
| Add the following import line to `training/detectors/__init__.py`: |
|
|
| ```python |
| from .my_detector import MyDetector |
| ``` |
|
|
| That's it! Now you can train and test with your custom detector: |
|
|
| ```bash |
| # Train |
| python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \ |
| --detector_path ./training/config/detector/my_detector.yaml \ |
| --no-save_feat --ddp |
| |
| # Test |
| python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \ |
| --test_dataset "protocol_2_test" "protocol_3_test" \ |
| --detector_path ./training/config/detector/my_detector.yaml \ |
| --weights_path logs/my_detector/<your_checkpoint_folder> |
| ``` |
|
|
|
|
|
|