deepfake-detector-efficientnet-b4-contrastive

A deepfake image detector built on EfficientNet-B4, trained with a hybrid loss combining cross-entropy and contrastive learning. The contrastive objective organizes the feature space so that diverse manipulation techniques (face swapping, reenactment, attribute editing) cluster together and away from authentic images — improving generalization to unseen datasets.

Based on the Self-Blended Images (SBI) training framework (Shiohara & Yamasaki, CVPR 2022), enhanced with an optimized triplet-style contrastive loss.

Model description

The model uses an EfficientNet-B4 backbone to extract a 1792-dimensional feature vector from a 380×380 face crop, followed by global average pooling and a linear classifier producing real/fake logits.

Training uses a hybrid loss: L_total = 0.7 · L_ce + 0.3 · L_contr

The contrastive loss operates at the batch level. For each fake anchor feature f_a, it identifies the nearest fake neighbor as the positive (f_p) and the nearest real neighbor as the negative (f_n), then minimizes: L_contr = mean( max(0, d(f_a, f_p) - d(f_a, f_n) + m) )

where d is Euclidean distance and margin m = 1.0.

Intended use

Detecting AI-generated or manipulated face images
Research on face forgery detection and generalization
Cross-dataset evaluation benchmarks

Not intended for: real-time video analysis (frame-level only), non-face images, or use as a sole ground-truth arbiter of image authenticity.

Evaluation results

Cross-dataset evaluation — the model was trained on SBI synthetic data derived from FaceForensics++ and evaluated on two held-out datasets without any fine-tuning.

Dataset	Method	AUC	Accuracy
CelebDF-v2	SBI + contrastive (ours)	0.9396	0.8784
CelebDF-v2	SBI baseline (our impl.)	0.9385	0.8571
CelebDF-v2	SBI paper (reported)	0.9318	N/A
FFIW	SBI + contrastive (ours)	0.8275	0.6320
FFIW	SBI baseline (our impl.)	0.8122	0.6420
FFIW	SBI paper (reported)	0.8483	N/A

The contrastive-enhanced model consistently outperforms our SBI baseline implementation on AUC across both datasets. On CelebDF-v2, it also surpasses the originally reported SBI result. The variation between our baseline and the originally reported numbers is attributed to differences in training conditions and implementation details — a common challenge in deep learning reproduction. The relative improvement from adding contrastive learning nonetheless supports the hypothesis that structuring the feature space around shared forgery patterns improves cross-dataset generalization.

Training details

Parameter	Value
Backbone	EfficientNet-B4 (advprop)
Input size	380 × 380
Optimizer	SAM (base: SGD, momentum 0.9)
Learning rate	1e-3 with LinearDecayLR
Epochs	100
Batch size	20
Framework	PyTorch
Face detector	RetinaFace (resnet50)

How to use

import torch
from efficientnet_pytorch import EfficientNet
import torch.nn as nn

class FeatureExtractor(nn.Module):
    def __init__(self, model_name):
        super().__init__()
        self.efficient_net = EfficientNet.from_pretrained(model_name, advprop=True)
        self.efficient_net._fc = nn.Identity()

    def forward(self, x):
        return self.efficient_net.extract_features(x)

class Detector(nn.Module):
    def __init__(self):
        super().__init__()
        self.global_feature_extractor = FeatureExtractor("efficientnet-b4")
        self.global_pool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Linear(1792, 2)

    def forward(self, img):
        features = self.global_feature_extractor(img)
        pooled = self.global_pool(features).view(features.size(0), -1)
        return self.classifier(pooled)

# Load
model = Detector()
checkpoint = torch.load("79_0.9980_val.tar")
model.load_state_dict(checkpoint["model"])
model.eval()

# Inference — expects a (B, 3, 380, 380) float tensor normalized to [0, 1]
# with face crops extracted via RetinaFace
with torch.no_grad():
    logits = model(img_tensor)
    score = logits.softmax(1)[:, 1].item()  # probability of fake

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for nikokons/contrastive-deepfake-detector

Base model

google/efficientnet-b4

Finetuned

(13)

this model