File size: 7,798 Bytes
8bc3305
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
# DFG - Deepfake Genome Codebase

## 1. Environment Setup

Create and activate the conda environment:

```bash
# Create a new conda environment (Python 3.10 recommended)
conda create -n dfg python=3.10 -y

# Activate the environment
conda activate dfg

# Install dependencies
pip install -r requirements.txt
```

## 2. Dataset Configuration

Before training or testing, you need to update the **dataset global path** to match your actual data location.

Open `training/dataset/abstract_dataset.py` and modify the `DATASET_GLOBAL_PATH` variable:

```python
# Change this to your actual dataset root path
DATASET_GLOBAL_PATH = "/your/actual/dataset/path/"
```

This path should point to the root directory containing your deepfake detection datasets (e.g., `DeepFakeGenome`, `deepfake_detecton_dataset`, etc.).

## 3. Project and Dataset Structure

```
DFG/
β”œβ”€β”€ preprocessing/
β”‚   └── dataset_json/          # Dataset index JSON files
β”‚       β”œβ”€β”€ protocol_2_train.json
β”‚       β”œβ”€β”€ protocol_2_test.json
β”‚       β”œβ”€β”€ protocol_3_test.json
β”‚       β”œβ”€β”€ protocol_4_test.json
β”‚       └── ...
β”œβ”€β”€ training/
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   └── detector/          # Detector config YAML files
β”‚   β”œβ”€β”€ detectors/             # Detector implementations
β”‚   β”‚   β”œβ”€β”€ __init__.py        # Register all detectors here
β”‚   β”‚   β”œβ”€β”€ base_detector.py
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ networks/              # Backbone network implementations
β”‚   β”œβ”€β”€ loss/                  # Loss function definitions
β”‚   β”œβ”€β”€ metrics/               # Evaluation metrics
β”‚   β”œβ”€β”€ train.py               # Training entry point
β”‚   └── test_pall.py           # Testing entry point
β”œβ”€β”€ train.sh                   # Training script examples
β”œβ”€β”€ test.sh                    # Testing script examples
β”œβ”€β”€ requirements.txt           # Python dependencies
└── README.md
```

## 4. Training

Refer to `train.sh` for all training commands. Example:

```bash
python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \
    --detector_path ./training/config/detector/clip_large_fft.yaml \
    --no-save_feat --ddp
```

Key arguments:
- `--master_port`: port for distributed training (change if port conflicts occur)
- `--nproc_per_node`: number of GPUs
- `--detector_path`: path to the detector config YAML
- `--no-save_feat`: disable feature saving during training
- `--ddp`: enable DistributedDataParallel

## 5. Testing

Refer to `test.sh` for all testing commands. Example:

```bash
# Test on protocol 2 & 3
python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \
    --test_dataset "protocol_2_test" "protocol_3_test" \
    --detector_path ./training/config/detector/clip_large_fft.yaml \
    --weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51

# Test on protocol 4
python -m torch.distributed.launch --master_port=29512 --nproc_per_node=8 training/test_pall.py --ddp \
    --test_dataset "protocol_4_test" \
    --detector_path ./training/config/detector/clip_large_fft.yaml \
    --weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51 \
    --test_config test_config_p4.yaml
```

Key arguments:
- `--test_dataset`: one or more dataset names (must match JSON filenames under `preprocessing/dataset_json/`)
- `--weights_path`: path to trained model checkpoint directory
- `--test_config`: additional test configuration (required for protocol 4)

## 6. Adding a Custom Detector

To integrate your own detector into the framework, follow these three steps:

### Step 1: Create the detector config YAML

Create a new file under `training/config/detector/`, e.g., `my_detector.yaml`:

```yaml
# log dir
log_dir: logs/my_detector

# model setting
pretrained: null
model_name: my_detector
backbone_name: resnet34

# backbone setting
backbone_config:
  mode: original
  num_classes: 2
  inc: 3
  dropout: false

# dataset
all_dataset: [FaceForensics++, FF-F2F, FF-DF, FF-FS, FF-NT, FaceShifter, DeepFakeDetection, Celeb-DF-v1, Celeb-DF-v2, DFDCP, DFDC, DeeperForensics-1.0, UADFV]
train_dataset: [protocol_2_train]
test_dataset: [protocol_2_test]

compression: c23
train_batchSize: 64
test_batchSize: 64
workers: 8
frame_num: {'train': 16, 'test': 16}
resolution: 224
with_mask: false
with_landmark: false

# data augmentation
use_data_augmentation: false
data_aug:
  flip_prob: 0.5
  rotate_prob: 0.5
  rotate_limit: [-10, 10]
  blur_prob: 0.5
  blur_limit: [3, 7]
  brightness_prob: 0.5
  brightness_limit: [-0.1, 0.1]
  contrast_limit: [-0.1, 0.1]
  quality_lower: 40
  quality_upper: 100

# mean and std for normalization
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]

# optimizer config
optimizer:
  type: adam
  adam:
    lr: 0.0002
    beta1: 0.9
    beta2: 0.999
    eps: 0.00000001
    weight_decay: 0.0005
    amsgrad: false

# training config
lr_scheduler: null
nEpochs: 20
start_epoch: 0
save_epoch: 1
rec_iter: 100
logdir: ./logs
manualSeed: 1024
save_ckpt: true
save_feat: true

# loss function
loss_func: cross_entropy
losstype: null

# metric
metric_scoring: auc

# cuda
ngpu: 1
cuda: true
cudnn: true

save_avg: true
save_latest_ckpt: true
```

### Step 2: Create the detector Python file

Create `training/detectors/my_detector.py`:

```python
import torch
import torch.nn as nn

from metrics.base_metrics_class import calculate_metrics_for_train
from .base_detector import AbstractDetector
from detectors import DETECTOR
from networks import BACKBONE
from loss import LOSSFUNC


@DETECTOR.register_module(module_name='my_detector')
class MyDetector(AbstractDetector):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.backbone = self.build_backbone(config)
        self.loss_func = LOSSFUNC[config['loss_func']]()

    def build_backbone(self, config):
        backbone = BACKBONE[config['backbone_name']](config['backbone_config'])
        return backbone

    def features(self, data_dict: dict) -> torch.Tensor:
        return self.backbone(data_dict['image'])

    def classifier(self, features: torch.Tensor) -> torch.Tensor:
        return self.fc(features)

    def get_losses(self, data_dict: dict, pred_dict: dict) -> dict:
        label = data_dict['label']
        pred = pred_dict['cls']
        loss = self.loss_func(pred, label)
        return {'overall': loss}

    def get_train_metrics(self, data_dict: dict, pred_dict: dict) -> dict:
        label = data_dict['label']
        pred = pred_dict['cls']
        auc, eer, acc, ap = calculate_metrics_for_train(label.detach(), pred.detach())
        return {'acc': acc, 'auc': auc, 'eer': eer, 'ap': ap}

    def forward(self, data_dict: dict, inference=False) -> dict:
        features = self.features(data_dict)
        pred = self.classifier(features)
        prob = torch.softmax(pred, dim=1)[:, 1]
        pred_dict = {'cls': pred, 'prob': prob, 'feat': features}
        return pred_dict
```

### Step 3: Register the detector in `__init__.py`

Add the following import line to `training/detectors/__init__.py`:

```python
from .my_detector import MyDetector
```

That's it! Now you can train and test with your custom detector:

```bash
# Train
python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \
    --detector_path ./training/config/detector/my_detector.yaml \
    --no-save_feat --ddp

# Test
python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \
    --test_dataset "protocol_2_test" "protocol_3_test" \
    --detector_path ./training/config/detector/my_detector.yaml \
    --weights_path logs/my_detector/<your_checkpoint_folder>
```