# DFG - Deepfake Genome Codebase

## 1. Environment Setup

Create and activate the conda environment:

```bash
# Create a new conda environment (Python 3.10 recommended)
conda create -n dfg python=3.10 -y

# Activate the environment
conda activate dfg

# Install dependencies
pip install -r requirements.txt
```

## 2. Dataset Configuration

Before training or testing, you need to update the **dataset global path** to match your actual data location.

Open `training/dataset/abstract_dataset.py` and modify the `DATASET_GLOBAL_PATH` variable:

```python
# Change this to your actual dataset root path
DATASET_GLOBAL_PATH = "/your/actual/dataset/path/"
```

This path should point to the root directory containing your deepfake detection datasets (e.g., `DeepFakeGenome`, `deepfake_detecton_dataset`, etc.).

## 3. Project and Dataset Structure

```
DFG/
├── preprocessing/
│   └── dataset_json/          # Dataset index JSON files
│       ├── protocol_2_train.json
│       ├── protocol_2_test.json
│       ├── protocol_3_test.json
│       ├── protocol_4_test.json
│       └── ...
├── training/
│   ├── config/
│   │   └── detector/          # Detector config YAML files
│   ├── detectors/             # Detector implementations
│   │   ├── __init__.py        # Register all detectors here
│   │   ├── base_detector.py
│   │   └── ...
│   ├── networks/              # Backbone network implementations
│   ├── loss/                  # Loss function definitions
│   ├── metrics/               # Evaluation metrics
│   ├── train.py               # Training entry point
│   └── test_pall.py           # Testing entry point
├── train.sh                   # Training script examples
├── test.sh                    # Testing script examples
├── requirements.txt           # Python dependencies
└── README.md
```

## 4. Training

Refer to `train.sh` for all training commands. Example:

```bash
python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \
    --detector_path ./training/config/detector/clip_large_fft.yaml \
    --no-save_feat --ddp
```

Key arguments:
- `--master_port`: port for distributed training (change if port conflicts occur)
- `--nproc_per_node`: number of GPUs
- `--detector_path`: path to the detector config YAML
- `--no-save_feat`: disable feature saving during training
- `--ddp`: enable DistributedDataParallel

## 5. Testing

Refer to `test.sh` for all testing commands. Example:

```bash
# Test on protocol 2 & 3
python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \
    --test_dataset "protocol_2_test" "protocol_3_test" \
    --detector_path ./training/config/detector/clip_large_fft.yaml \
    --weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51

# Test on protocol 4
python -m torch.distributed.launch --master_port=29512 --nproc_per_node=8 training/test_pall.py --ddp \
    --test_dataset "protocol_4_test" \
    --detector_path ./training/config/detector/clip_large_fft.yaml \
    --weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51 \
    --test_config test_config_p4.yaml
```

Key arguments:
- `--test_dataset`: one or more dataset names (must match JSON filenames under `preprocessing/dataset_json/`)
- `--weights_path`: path to trained model checkpoint directory
- `--test_config`: additional test configuration (required for protocol 4)

## 6. Adding a Custom Detector

To integrate your own detector into the framework, follow these three steps:

### Step 1: Create the detector config YAML

Create a new file under `training/config/detector/`, e.g., `my_detector.yaml`:

```yaml
# log dir
log_dir: logs/my_detector

# model setting
pretrained: null
model_name: my_detector
backbone_name: resnet34

# backbone setting
backbone_config:
  mode: original
  num_classes: 2
  inc: 3
  dropout: false

# dataset
all_dataset: [FaceForensics++, FF-F2F, FF-DF, FF-FS, FF-NT, FaceShifter, DeepFakeDetection, Celeb-DF-v1, Celeb-DF-v2, DFDCP, DFDC, DeeperForensics-1.0, UADFV]
train_dataset: [protocol_2_train]
test_dataset: [protocol_2_test]

compression: c23
train_batchSize: 64
test_batchSize: 64
workers: 8
frame_num: {'train': 16, 'test': 16}
resolution: 224
with_mask: false
with_landmark: false

# data augmentation
use_data_augmentation: false
data_aug:
  flip_prob: 0.5
  rotate_prob: 0.5
  rotate_limit: [-10, 10]
  blur_prob: 0.5
  blur_limit: [3, 7]
  brightness_prob: 0.5
  brightness_limit: [-0.1, 0.1]
  contrast_limit: [-0.1, 0.1]
  quality_lower: 40
  quality_upper: 100

# mean and std for normalization
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]

# optimizer config
optimizer:
  type: adam
  adam:
    lr: 0.0002
    beta1: 0.9
    beta2: 0.999
    eps: 0.00000001
    weight_decay: 0.0005
    amsgrad: false

# training config
lr_scheduler: null
nEpochs: 20
start_epoch: 0
save_epoch: 1
rec_iter: 100
logdir: ./logs
manualSeed: 1024
save_ckpt: true
save_feat: true

# loss function
loss_func: cross_entropy
losstype: null

# metric
metric_scoring: auc

# cuda
ngpu: 1
cuda: true
cudnn: true

save_avg: true
save_latest_ckpt: true
```

### Step 2: Create the detector Python file

Create `training/detectors/my_detector.py`:

```python
import torch
import torch.nn as nn

from metrics.base_metrics_class import calculate_metrics_for_train
from .base_detector import AbstractDetector
from detectors import DETECTOR
from networks import BACKBONE
from loss import LOSSFUNC


@DETECTOR.register_module(module_name='my_detector')
class MyDetector(AbstractDetector):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.backbone = self.build_backbone(config)
        self.loss_func = LOSSFUNC[config['loss_func']]()

    def build_backbone(self, config):
        backbone = BACKBONE[config['backbone_name']](config['backbone_config'])
        return backbone

    def features(self, data_dict: dict) -> torch.Tensor:
        return self.backbone(data_dict['image'])

    def classifier(self, features: torch.Tensor) -> torch.Tensor:
        return self.fc(features)

    def get_losses(self, data_dict: dict, pred_dict: dict) -> dict:
        label = data_dict['label']
        pred = pred_dict['cls']
        loss = self.loss_func(pred, label)
        return {'overall': loss}

    def get_train_metrics(self, data_dict: dict, pred_dict: dict) -> dict:
        label = data_dict['label']
        pred = pred_dict['cls']
        auc, eer, acc, ap = calculate_metrics_for_train(label.detach(), pred.detach())
        return {'acc': acc, 'auc': auc, 'eer': eer, 'ap': ap}

    def forward(self, data_dict: dict, inference=False) -> dict:
        features = self.features(data_dict)
        pred = self.classifier(features)
        prob = torch.softmax(pred, dim=1)[:, 1]
        pred_dict = {'cls': pred, 'prob': prob, 'feat': features}
        return pred_dict
```

### Step 3: Register the detector in `__init__.py`

Add the following import line to `training/detectors/__init__.py`:

```python
from .my_detector import MyDetector
```

That's it! Now you can train and test with your custom detector:

```bash
# Train
python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \
    --detector_path ./training/config/detector/my_detector.yaml \
    --no-save_feat --ddp

# Test
python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \
    --test_dataset "protocol_2_test" "protocol_3_test" \
    --detector_path ./training/config/detector/my_detector.yaml \
    --weights_path logs/my_detector/<your_checkpoint_folder>
```