shunliwang

update

8bc3305 about 1 month ago

7.8 kB

	# DFG - Deepfake Genome Codebase

	## 1. Environment Setup

	Create and activate the conda environment:

	```bash
	# Create a new conda environment (Python 3.10 recommended)
	conda create -n dfg python=3.10 -y

	# Activate the environment
	conda activate dfg

	# Install dependencies
	pip install -r requirements.txt
	```

	## 2. Dataset Configuration

	Before training or testing, you need to update the dataset global path to match your actual data location.

	Open `training/dataset/abstract_dataset.py` and modify the `DATASET_GLOBAL_PATH` variable:

	```python
	# Change this to your actual dataset root path
	DATASET_GLOBAL_PATH = "/your/actual/dataset/path/"
	```

	This path should point to the root directory containing your deepfake detection datasets (e.g., `DeepFakeGenome`, `deepfake_detecton_dataset`, etc.).

	## 3. Project and Dataset Structure

	```
	DFG/
	├── preprocessing/
	│ └── dataset_json/ # Dataset index JSON files
	│ ├── protocol_2_train.json
	│ ├── protocol_2_test.json
	│ ├── protocol_3_test.json
	│ ├── protocol_4_test.json
	│ └── ...
	├── training/
	│ ├── config/
	│ │ └── detector/ # Detector config YAML files
	│ ├── detectors/ # Detector implementations
	│ │ ├── __init__.py # Register all detectors here
	│ │ ├── base_detector.py
	│ │ └── ...
	│ ├── networks/ # Backbone network implementations
	│ ├── loss/ # Loss function definitions
	│ ├── metrics/ # Evaluation metrics
	│ ├── train.py # Training entry point
	│ └── test_pall.py # Testing entry point
	├── train.sh # Training script examples
	├── test.sh # Testing script examples
	├── requirements.txt # Python dependencies
	└── README.md
	```

	## 4. Training

	Refer to `train.sh` for all training commands. Example:

	```bash
	python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \
	--detector_path ./training/config/detector/clip_large_fft.yaml \
	--no-save_feat --ddp
	```

	Key arguments:
	- `--master_port`: port for distributed training (change if port conflicts occur)
	- `--nproc_per_node`: number of GPUs
	- `--detector_path`: path to the detector config YAML
	- `--no-save_feat`: disable feature saving during training
	- `--ddp`: enable DistributedDataParallel

	## 5. Testing

	Refer to `test.sh` for all testing commands. Example:

	```bash
	# Test on protocol 2 & 3
	python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \
	--test_dataset "protocol_2_test" "protocol_3_test" \
	--detector_path ./training/config/detector/clip_large_fft.yaml \
	--weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51

	# Test on protocol 4
	python -m torch.distributed.launch --master_port=29512 --nproc_per_node=8 training/test_pall.py --ddp \
	--test_dataset "protocol_4_test" \
	--detector_path ./training/config/detector/clip_large_fft.yaml \
	--weights_path logs/clip_models/clip_large_fft_2025-11-08-13-56-51 \
	--test_config test_config_p4.yaml
	```

	Key arguments:
	- `--test_dataset`: one or more dataset names (must match JSON filenames under `preprocessing/dataset_json/`)
	- `--weights_path`: path to trained model checkpoint directory
	- `--test_config`: additional test configuration (required for protocol 4)

	## 6. Adding a Custom Detector

	To integrate your own detector into the framework, follow these three steps:

	### Step 1: Create the detector config YAML

	Create a new file under `training/config/detector/`, e.g., `my_detector.yaml`:

	```yaml
	# log dir
	log_dir: logs/my_detector

	# model setting
	pretrained: null
	model_name: my_detector
	backbone_name: resnet34

	# backbone setting
	backbone_config:
	mode: original
	num_classes: 2
	inc: 3
	dropout: false

	# dataset
	all_dataset: [FaceForensics++, FF-F2F, FF-DF, FF-FS, FF-NT, FaceShifter, DeepFakeDetection, Celeb-DF-v1, Celeb-DF-v2, DFDCP, DFDC, DeeperForensics-1.0, UADFV]
	train_dataset: [protocol_2_train]
	test_dataset: [protocol_2_test]

	compression: c23
	train_batchSize: 64
	test_batchSize: 64
	workers: 8
	frame_num: {'train': 16, 'test': 16}
	resolution: 224
	with_mask: false
	with_landmark: false

	# data augmentation
	use_data_augmentation: false
	data_aug:
	flip_prob: 0.5
	rotate_prob: 0.5
	rotate_limit: [-10, 10]
	blur_prob: 0.5
	blur_limit: [3, 7]
	brightness_prob: 0.5
	brightness_limit: [-0.1, 0.1]
	contrast_limit: [-0.1, 0.1]
	quality_lower: 40
	quality_upper: 100

	# mean and std for normalization
	mean: [0.485, 0.456, 0.406]
	std: [0.229, 0.224, 0.225]

	# optimizer config
	optimizer:
	type: adam
	adam:
	lr: 0.0002
	beta1: 0.9
	beta2: 0.999
	eps: 0.00000001
	weight_decay: 0.0005
	amsgrad: false

	# training config
	lr_scheduler: null
	nEpochs: 20
	start_epoch: 0
	save_epoch: 1
	rec_iter: 100
	logdir: ./logs
	manualSeed: 1024
	save_ckpt: true
	save_feat: true

	# loss function
	loss_func: cross_entropy
	losstype: null

	# metric
	metric_scoring: auc

	# cuda
	ngpu: 1
	cuda: true
	cudnn: true

	save_avg: true
	save_latest_ckpt: true
	```

	### Step 2: Create the detector Python file

	Create `training/detectors/my_detector.py`:

	```python
	import torch
	import torch.nn as nn

	from metrics.base_metrics_class import calculate_metrics_for_train
	from .base_detector import AbstractDetector
	from detectors import DETECTOR
	from networks import BACKBONE
	from loss import LOSSFUNC


	@DETECTOR.register_module(module_name='my_detector')
	class MyDetector(AbstractDetector):
	def __init__(self, config):
	super().__init__()
	self.config = config
	self.backbone = self.build_backbone(config)
	self.loss_func = LOSSFUNC[config['loss_func']]()

	def build_backbone(self, config):
	backbone = BACKBONE[config['backbone_name']](config['backbone_config'])
	return backbone

	def features(self, data_dict: dict) -> torch.Tensor:
	return self.backbone(data_dict['image'])

	def classifier(self, features: torch.Tensor) -> torch.Tensor:
	return self.fc(features)

	def get_losses(self, data_dict: dict, pred_dict: dict) -> dict:
	label = data_dict['label']
	pred = pred_dict['cls']
	loss = self.loss_func(pred, label)
	return {'overall': loss}

	def get_train_metrics(self, data_dict: dict, pred_dict: dict) -> dict:
	label = data_dict['label']
	pred = pred_dict['cls']
	auc, eer, acc, ap = calculate_metrics_for_train(label.detach(), pred.detach())
	return {'acc': acc, 'auc': auc, 'eer': eer, 'ap': ap}

	def forward(self, data_dict: dict, inference=False) -> dict:
	features = self.features(data_dict)
	pred = self.classifier(features)
	prob = torch.softmax(pred, dim=1)[:, 1]
	pred_dict = {'cls': pred, 'prob': prob, 'feat': features}
	return pred_dict
	```

	### Step 3: Register the detector in `__init__.py`

	Add the following import line to `training/detectors/__init__.py`:

	```python
	from .my_detector import MyDetector
	```

	That's it! Now you can train and test with your custom detector:

	```bash
	# Train
	python -m torch.distributed.launch --master_port=29503 --nproc_per_node=8 training/train.py \
	--detector_path ./training/config/detector/my_detector.yaml \
	--no-save_feat --ddp

	# Test
	python -m torch.distributed.launch --master_port=29510 --nproc_per_node=8 training/test_pall.py --ddp \
	--test_dataset "protocol_2_test" "protocol_3_test" \
	--detector_path ./training/config/detector/my_detector.yaml \
	--weights_path logs/my_detector/<your_checkpoint_folder>
	```