Deepfake Video Detection (Frame-Based EfficientNet)

Model Card


Model Overview

This model is a frame-based deepfake detection system designed to identify whether facial images extracted from videos are Real or Fake.
Although trained at the image (face) level, the model is extended to video-level deepfake detection using aggregation techniques.

The model is intended as a strong baseline for deepfake detection research and practical experimentation.


Model Architecture

  • Backbone: EfficientNet-B0
  • Pretraining: ImageNet
  • Classifier: Fully connected layer (2 outputs)
  • Framework: PyTorch

Label Mapping

0 β†’ Real 1 β†’ Fake


Input & Output

Input

  • RGB face images
  • Image size: 224 Γ— 224
  • Faces cropped using MTCNN

Output

  • Probability distribution over:
    • Real
    • Fake

Final decision is made using the Fake class probability.


Training Data

Dataset

  • Celeb-DF (V2)
    • High-quality celebrity deepfake videos
    • Balanced subset used for training
    • Faces extracted from video frames

Training Configuration

  • Optimizer: Adam
  • Loss Function: CrossEntropyLoss
  • Epochs: 10
  • Batch Size: 8
  • Image Normalization: ImageNet mean & std
  • Hardware: NVIDIA RTX 3050 (4GB)

Evaluation Metrics

Evaluation was performed on a held-out test set.

Metric Value
Accuracy ~77%
Precision ~0.78
Recall ~0.76
F1-Score ~0.77
ROC-AUC ~0.83

These results indicate that the model learns meaningful spatial artifacts associated with deepfake generation.


Video-Level Deepfake Detection

Although trained on images, the model supports video deepfake detection through the following pipeline:

  1. Frame extraction from the input video
  2. Face detection using MTCNN
  3. Frame-level prediction using the trained CNN
  4. Aggregation using:
    • Average fake confidence
    • Fake frame ratio

A video is classified as Fake if aggregated metrics exceed predefined thresholds.


Intended Use

Recommended Use

  • Academic research
  • Learning and experimentation
  • Baseline deepfake detection
  • Video authenticity analysis
  • Demonstration and prototyping

Not Recommended Use

  • Legal or forensic evidence
  • Fully automated content moderation

Limitations

  • Frame-based approach (no temporal modeling)
  • Struggles with high-quality deepfakes (e.g., Celeb-DF)
  • Cannot capture temporal artifacts such as:
    • Lip-sync mismatch
    • Eye-blink inconsistencies
    • Frame-to-frame motion artifacts

Developer

Divyanshu Chauhan

AI & Machine Learning Engineer

Specialization: Computer Vision & Deepfake Detection

Portfolio: https://divyanshu-chauhan-7786.github.io/divyanshu-chauhan/

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support