Deepfake Video Detection (Frame-Based EfficientNet)
Model Card
Model Overview
This model is a frame-based deepfake detection system designed to identify whether facial images extracted from videos are Real or Fake.
Although trained at the image (face) level, the model is extended to video-level deepfake detection using aggregation techniques.
The model is intended as a strong baseline for deepfake detection research and practical experimentation.
Model Architecture
- Backbone: EfficientNet-B0
- Pretraining: ImageNet
- Classifier: Fully connected layer (2 outputs)
- Framework: PyTorch
Label Mapping
0 β Real 1 β Fake
Input & Output
Input
- RGB face images
- Image size: 224 Γ 224
- Faces cropped using MTCNN
Output
- Probability distribution over:
- Real
- Fake
Final decision is made using the Fake class probability.
Training Data
Dataset
- Celeb-DF (V2)
- High-quality celebrity deepfake videos
- Balanced subset used for training
- Faces extracted from video frames
Training Configuration
- Optimizer: Adam
- Loss Function: CrossEntropyLoss
- Epochs: 10
- Batch Size: 8
- Image Normalization: ImageNet mean & std
- Hardware: NVIDIA RTX 3050 (4GB)
Evaluation Metrics
Evaluation was performed on a held-out test set.
| Metric | Value |
|---|---|
| Accuracy | ~77% |
| Precision | ~0.78 |
| Recall | ~0.76 |
| F1-Score | ~0.77 |
| ROC-AUC | ~0.83 |
These results indicate that the model learns meaningful spatial artifacts associated with deepfake generation.
Video-Level Deepfake Detection
Although trained on images, the model supports video deepfake detection through the following pipeline:
- Frame extraction from the input video
- Face detection using MTCNN
- Frame-level prediction using the trained CNN
- Aggregation using:
- Average fake confidence
- Fake frame ratio
A video is classified as Fake if aggregated metrics exceed predefined thresholds.
Intended Use
Recommended Use
- Academic research
- Learning and experimentation
- Baseline deepfake detection
- Video authenticity analysis
- Demonstration and prototyping
Not Recommended Use
- Legal or forensic evidence
- Fully automated content moderation
Limitations
- Frame-based approach (no temporal modeling)
- Struggles with high-quality deepfakes (e.g., Celeb-DF)
- Cannot capture temporal artifacts such as:
- Lip-sync mismatch
- Eye-blink inconsistencies
- Frame-to-frame motion artifacts
Developer
Divyanshu Chauhan
AI & Machine Learning Engineer
Specialization: Computer Vision & Deepfake Detection
Portfolio: https://divyanshu-chauhan-7786.github.io/divyanshu-chauhan/