shunliwang
update
8bc3305
# DFG - Deepfake Genome Codebase
## 1. Environment Setup
Create and activate the conda environment:
```bash
# Create a new conda environment (Python 3.10 recommended)
conda create -n dfg python=3.10 -y
# Activate the environment
conda activate dfg
# Install dependencies
pip install -r requirements.txt
```
## 2. Dataset Configuration
Before training or testing, you need to update the **dataset global path** to match your actual data location.
Open `training/dataset/abstract_dataset.py` and modify the `DATASET_GLOBAL_PATH` variable:
```python
# Change this to your actual dataset root path
DATASET_GLOBAL_PATH = "/your/actual/dataset/path/"
```
This path should point to the root directory containing your deepfake detection datasets (e.g., `DeepFakeGenome`, `deepfake_detecton_dataset`, etc.).
## 3. Project and Dataset Structure
```
DFG/
├── preprocessing/
│ └── dataset_json/ # Dataset index JSON files
│ ├── protocol_2_train.json
│ ├── protocol_2_test.json
│ ├── protocol_3_test.json
│ ├── protocol_4_test.json
│ └── ...
├── training/
│ ├── config/
│ │ └── detector/ # Detector config YAML files
│ ├── detectors/ # Detector implementations
│ │ ├── __init__.py # Register all detectors here
│ │ ├── base_detector.py
│ │ └── ...
│ ├── networks/ # Backbone network implementations
│ ├── loss/ # Loss function definitions
│ ├── metrics/ # Evaluation metrics
│ ├── train.py # Training entry point
│ └── test_pall.py # Testing entry point
├── train.sh # Training script examples
├── test.sh # Testing script examples
├── requirements.txt # Python dependencies
└── README.md
```
## 4. Training
Refer to `train.sh` for all training commands. Example:
```bash
python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \
--detector_path ./training/config/detector/clip_large_fft.yaml \
--no-save_feat --ddp
```
Key arguments:
- `--master_port`: port for distributed training (change if port conflicts occur)
- `--nproc_per_node`: number of GPUs
- `--detector_path`: path to the detector config YAML
- `--no-save_feat`: disable feature saving during training
- `--ddp`: enable DistributedDataParallel
## 5. Testing
Refer to `test.sh` for all testing commands. Example:
```bash
# Test on protocol 2 & 3
python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \
--test_dataset "protocol_2_test" "protocol_3_test" \
--detector_path ./training/config/detector/clip_large_fft.yaml \
--weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51
# Test on protocol 4
python -m torch.distributed.launch --master_port=29512 --nproc_per_node=8 training/test_pall.py --ddp \
--test_dataset "protocol_4_test" \
--detector_path ./training/config/detector/clip_large_fft.yaml \
--weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51 \
--test_config test_config_p4.yaml
```
Key arguments:
- `--test_dataset`: one or more dataset names (must match JSON filenames under `preprocessing/dataset_json/`)
- `--weights_path`: path to trained model checkpoint directory
- `--test_config`: additional test configuration (required for protocol 4)
## 6. Adding a Custom Detector
To integrate your own detector into the framework, follow these three steps:
### Step 1: Create the detector config YAML
Create a new file under `training/config/detector/`, e.g., `my_detector.yaml`:
```yaml
# log dir
log_dir: logs/my_detector
# model setting
pretrained: null
model_name: my_detector
backbone_name: resnet34
# backbone setting
backbone_config:
mode: original
num_classes: 2
inc: 3
dropout: false
# dataset
all_dataset: [FaceForensics++, FF-F2F, FF-DF, FF-FS, FF-NT, FaceShifter, DeepFakeDetection, Celeb-DF-v1, Celeb-DF-v2, DFDCP, DFDC, DeeperForensics-1.0, UADFV]
train_dataset: [protocol_2_train]
test_dataset: [protocol_2_test]
compression: c23
train_batchSize: 64
test_batchSize: 64
workers: 8
frame_num: {'train': 16, 'test': 16}
resolution: 224
with_mask: false
with_landmark: false
# data augmentation
use_data_augmentation: false
data_aug:
flip_prob: 0.5
rotate_prob: 0.5
rotate_limit: [-10, 10]
blur_prob: 0.5
blur_limit: [3, 7]
brightness_prob: 0.5
brightness_limit: [-0.1, 0.1]
contrast_limit: [-0.1, 0.1]
quality_lower: 40
quality_upper: 100
# mean and std for normalization
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
# optimizer config
optimizer:
type: adam
adam:
lr: 0.0002
beta1: 0.9
beta2: 0.999
eps: 0.00000001
weight_decay: 0.0005
amsgrad: false
# training config
lr_scheduler: null
nEpochs: 20
start_epoch: 0
save_epoch: 1
rec_iter: 100
logdir: ./logs
manualSeed: 1024
save_ckpt: true
save_feat: true
# loss function
loss_func: cross_entropy
losstype: null
# metric
metric_scoring: auc
# cuda
ngpu: 1
cuda: true
cudnn: true
save_avg: true
save_latest_ckpt: true
```
### Step 2: Create the detector Python file
Create `training/detectors/my_detector.py`:
```python
import torch
import torch.nn as nn
from metrics.base_metrics_class import calculate_metrics_for_train
from .base_detector import AbstractDetector
from detectors import DETECTOR
from networks import BACKBONE
from loss import LOSSFUNC
@DETECTOR.register_module(module_name='my_detector')
class MyDetector(AbstractDetector):
def __init__(self, config):
super().__init__()
self.config = config
self.backbone = self.build_backbone(config)
self.loss_func = LOSSFUNC[config['loss_func']]()
def build_backbone(self, config):
backbone = BACKBONE[config['backbone_name']](config['backbone_config'])
return backbone
def features(self, data_dict: dict) -> torch.Tensor:
return self.backbone(data_dict['image'])
def classifier(self, features: torch.Tensor) -> torch.Tensor:
return self.fc(features)
def get_losses(self, data_dict: dict, pred_dict: dict) -> dict:
label = data_dict['label']
pred = pred_dict['cls']
loss = self.loss_func(pred, label)
return {'overall': loss}
def get_train_metrics(self, data_dict: dict, pred_dict: dict) -> dict:
label = data_dict['label']
pred = pred_dict['cls']
auc, eer, acc, ap = calculate_metrics_for_train(label.detach(), pred.detach())
return {'acc': acc, 'auc': auc, 'eer': eer, 'ap': ap}
def forward(self, data_dict: dict, inference=False) -> dict:
features = self.features(data_dict)
pred = self.classifier(features)
prob = torch.softmax(pred, dim=1)[:, 1]
pred_dict = {'cls': pred, 'prob': prob, 'feat': features}
return pred_dict
```
### Step 3: Register the detector in `__init__.py`
Add the following import line to `training/detectors/__init__.py`:
```python
from .my_detector import MyDetector
```
That's it! Now you can train and test with your custom detector:
```bash
# Train
python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \
--detector_path ./training/config/detector/my_detector.yaml \
--no-save_feat --ddp
# Test
python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \
--test_dataset "protocol_2_test" "protocol_3_test" \
--detector_path ./training/config/detector/my_detector.yaml \
--weights_path logs/my_detector/<your_checkpoint_folder>
```