Deepfake Video Detection (Frame-Based EfficientNet)

Model Card

Model Overview

This model is a frame-based deepfake detection system designed to identify whether facial images extracted from videos are Real or Fake.
Although trained at the image (face) level, the model is extended to video-level deepfake detection using aggregation techniques.

The model is intended as a strong baseline for deepfake detection research and practical experimentation.

Model Architecture

Backbone: EfficientNet-B0
Pretraining: ImageNet
Classifier: Fully connected layer (2 outputs)
Framework: PyTorch

Label Mapping

0 → Real 1 → Fake

Input & Output

Input

RGB face images
Image size: 224 × 224
Faces cropped using MTCNN

Output

Probability distribution over:
- Real
- Fake

Final decision is made using the Fake class probability.

Training Data

Dataset

Celeb-DF (V2)
- High-quality celebrity deepfake videos
- Balanced subset used for training
- Faces extracted from video frames

Training Configuration

Optimizer: Adam
Loss Function: CrossEntropyLoss
Epochs: 10
Batch Size: 8
Image Normalization: ImageNet mean & std
Hardware: NVIDIA RTX 3050 (4GB)

Evaluation Metrics

Evaluation was performed on a held-out test set.

Metric	Value
Accuracy	~77%
Precision	~0.78
Recall	~0.76
F1-Score	~0.77
ROC-AUC	~0.83

These results indicate that the model learns meaningful spatial artifacts associated with deepfake generation.

Video-Level Deepfake Detection

Although trained on images, the model supports video deepfake detection through the following pipeline:

Frame extraction from the input video
Face detection using MTCNN
Frame-level prediction using the trained CNN
Aggregation using:
- Average fake confidence
- Fake frame ratio

A video is classified as Fake if aggregated metrics exceed predefined thresholds.

Intended Use

Recommended Use

Academic research
Learning and experimentation
Baseline deepfake detection
Video authenticity analysis
Demonstration and prototyping

Not Recommended Use

Legal or forensic evidence
Fully automated content moderation

Limitations

Frame-based approach (no temporal modeling)
Struggles with high-quality deepfakes (e.g., Celeb-DF)
Cannot capture temporal artifacts such as:
- Lip-sync mismatch
- Eye-blink inconsistencies
- Frame-to-frame motion artifacts

Developer

Divyanshu Chauhan

AI & Machine Learning Engineer

Specialization: Computer Vision & Deepfake Detection

Portfolio: https://divyanshu-chauhan-7786.github.io/divyanshu-chauhan/

Downloads last month: -; Downloads are not tracked for this model. How to track