deepfake-detector-efficientnet-b4-contrastive

A deepfake image detector built on EfficientNet-B4, trained with a hybrid loss combining cross-entropy and contrastive learning. The contrastive objective organizes the feature space so that diverse manipulation techniques (face swapping, reenactment, attribute editing) cluster together and away from authentic images — improving generalization to unseen datasets.

Based on the Self-Blended Images (SBI) training framework (Shiohara & Yamasaki, CVPR 2022), enhanced with an optimized triplet-style contrastive loss.


Model description

The model uses an EfficientNet-B4 backbone to extract a 1792-dimensional feature vector from a 380×380 face crop, followed by global average pooling and a linear classifier producing real/fake logits.

Training uses a hybrid loss: L_total = 0.7 · L_ce + 0.3 · L_contr

The contrastive loss operates at the batch level. For each fake anchor feature f_a, it identifies the nearest fake neighbor as the positive (f_p) and the nearest real neighbor as the negative (f_n), then minimizes: L_contr = mean( max(0, d(f_a, f_p) - d(f_a, f_n) + m) )

where d is Euclidean distance and margin m = 1.0.


Intended use

  • Detecting AI-generated or manipulated face images
  • Research on face forgery detection and generalization
  • Cross-dataset evaluation benchmarks

Not intended for: real-time video analysis (frame-level only), non-face images, or use as a sole ground-truth arbiter of image authenticity.


Evaluation results

Cross-dataset evaluation — the model was trained on SBI synthetic data derived from FaceForensics++ and evaluated on two held-out datasets without any fine-tuning.

Dataset Method AUC Accuracy
CelebDF-v2 SBI + contrastive (ours) 0.9396 0.8784
CelebDF-v2 SBI baseline (our impl.) 0.9385 0.8571
CelebDF-v2 SBI paper (reported) 0.9318 N/A
FFIW SBI + contrastive (ours) 0.8275 0.6320
FFIW SBI baseline (our impl.) 0.8122 0.6420
FFIW SBI paper (reported) 0.8483 N/A

The contrastive-enhanced model consistently outperforms our SBI baseline implementation on AUC across both datasets. On CelebDF-v2, it also surpasses the originally reported SBI result. The variation between our baseline and the originally reported numbers is attributed to differences in training conditions and implementation details — a common challenge in deep learning reproduction. The relative improvement from adding contrastive learning nonetheless supports the hypothesis that structuring the feature space around shared forgery patterns improves cross-dataset generalization.


Training details

Parameter Value
Backbone EfficientNet-B4 (advprop)
Input size 380 × 380
Optimizer SAM (base: SGD, momentum 0.9)
Learning rate 1e-3 with LinearDecayLR
Epochs 100
Batch size 20
Framework PyTorch
Face detector RetinaFace (resnet50)

How to use

import torch
from efficientnet_pytorch import EfficientNet
import torch.nn as nn

class FeatureExtractor(nn.Module):
    def __init__(self, model_name):
        super().__init__()
        self.efficient_net = EfficientNet.from_pretrained(model_name, advprop=True)
        self.efficient_net._fc = nn.Identity()

    def forward(self, x):
        return self.efficient_net.extract_features(x)

class Detector(nn.Module):
    def __init__(self):
        super().__init__()
        self.global_feature_extractor = FeatureExtractor("efficientnet-b4")
        self.global_pool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Linear(1792, 2)

    def forward(self, img):
        features = self.global_feature_extractor(img)
        pooled = self.global_pool(features).view(features.size(0), -1)
        return self.classifier(pooled)

# Load
model = Detector()
checkpoint = torch.load("79_0.9980_val.tar")
model.load_state_dict(checkpoint["model"])
model.eval()

# Inference — expects a (B, 3, 380, 380) float tensor normalized to [0, 1]
# with face crops extracted via RetinaFace
with torch.no_grad():
    logits = model(img_tensor)
    score = logits.softmax(1)[:, 1].item()  # probability of fake
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nikokons/contrastive-deepfake-detector

Finetuned
(12)
this model

Dataset used to train nikokons/contrastive-deepfake-detector