deepfake-detector-efficientnet-b4-contrastive
A deepfake image detector built on EfficientNet-B4, trained with a hybrid loss combining cross-entropy and contrastive learning. The contrastive objective organizes the feature space so that diverse manipulation techniques (face swapping, reenactment, attribute editing) cluster together and away from authentic images — improving generalization to unseen datasets.
Based on the Self-Blended Images (SBI) training framework (Shiohara & Yamasaki, CVPR 2022), enhanced with an optimized triplet-style contrastive loss.
Model description
The model uses an EfficientNet-B4 backbone to extract a 1792-dimensional feature vector from a 380×380 face crop, followed by global average pooling and a linear classifier producing real/fake logits.
Training uses a hybrid loss: L_total = 0.7 · L_ce + 0.3 · L_contr
The contrastive loss operates at the batch level. For each fake anchor feature f_a,
it identifies the nearest fake neighbor as the positive (f_p) and the nearest real
neighbor as the negative (f_n), then minimizes:
L_contr = mean( max(0, d(f_a, f_p) - d(f_a, f_n) + m) )
where d is Euclidean distance and margin m = 1.0.
Intended use
- Detecting AI-generated or manipulated face images
- Research on face forgery detection and generalization
- Cross-dataset evaluation benchmarks
Not intended for: real-time video analysis (frame-level only), non-face images, or use as a sole ground-truth arbiter of image authenticity.
Evaluation results
Cross-dataset evaluation — the model was trained on SBI synthetic data derived from FaceForensics++ and evaluated on two held-out datasets without any fine-tuning.
| Dataset | Method | AUC | Accuracy |
|---|---|---|---|
| CelebDF-v2 | SBI + contrastive (ours) | 0.9396 | 0.8784 |
| CelebDF-v2 | SBI baseline (our impl.) | 0.9385 | 0.8571 |
| CelebDF-v2 | SBI paper (reported) | 0.9318 | N/A |
| FFIW | SBI + contrastive (ours) | 0.8275 | 0.6320 |
| FFIW | SBI baseline (our impl.) | 0.8122 | 0.6420 |
| FFIW | SBI paper (reported) | 0.8483 | N/A |
The contrastive-enhanced model consistently outperforms our SBI baseline implementation on AUC across both datasets. On CelebDF-v2, it also surpasses the originally reported SBI result. The variation between our baseline and the originally reported numbers is attributed to differences in training conditions and implementation details — a common challenge in deep learning reproduction. The relative improvement from adding contrastive learning nonetheless supports the hypothesis that structuring the feature space around shared forgery patterns improves cross-dataset generalization.
Training details
| Parameter | Value |
|---|---|
| Backbone | EfficientNet-B4 (advprop) |
| Input size | 380 × 380 |
| Optimizer | SAM (base: SGD, momentum 0.9) |
| Learning rate | 1e-3 with LinearDecayLR |
| Epochs | 100 |
| Batch size | 20 |
| Framework | PyTorch |
| Face detector | RetinaFace (resnet50) |
How to use
import torch
from efficientnet_pytorch import EfficientNet
import torch.nn as nn
class FeatureExtractor(nn.Module):
def __init__(self, model_name):
super().__init__()
self.efficient_net = EfficientNet.from_pretrained(model_name, advprop=True)
self.efficient_net._fc = nn.Identity()
def forward(self, x):
return self.efficient_net.extract_features(x)
class Detector(nn.Module):
def __init__(self):
super().__init__()
self.global_feature_extractor = FeatureExtractor("efficientnet-b4")
self.global_pool = nn.AdaptiveAvgPool2d(1)
self.classifier = nn.Linear(1792, 2)
def forward(self, img):
features = self.global_feature_extractor(img)
pooled = self.global_pool(features).view(features.size(0), -1)
return self.classifier(pooled)
# Load
model = Detector()
checkpoint = torch.load("79_0.9980_val.tar")
model.load_state_dict(checkpoint["model"])
model.eval()
# Inference — expects a (B, 3, 380, 380) float tensor normalized to [0, 1]
# with face crops extracted via RetinaFace
with torch.no_grad():
logits = model(img_tensor)
score = logits.softmax(1)[:, 1].item() # probability of fake
Model tree for nikokons/contrastive-deepfake-detector
Base model
google/efficientnet-b4