|
|
--- |
|
|
tags: |
|
|
- deepfake-detection |
|
|
- computer-vision |
|
|
- image-classification |
|
|
- pytorch |
|
|
- efficientnet |
|
|
- swin-transformer |
|
|
- security |
|
|
library_name: pytorch |
|
|
license: mit |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
pipeline_tag: image-classification |
|
|
inference: false |
|
|
widgets: |
|
|
- text: "Test the DeepGuard Model Live" |
|
|
src: "https://harshasnade-deepfake-detection-system-v1.hf.space" |
|
|
--- |
|
|
|
|
|
# DeepGuard - Deepfake Detection System |
|
|
|
|
|
[](https://deepfakescan.vercel.app/) |
|
|
[](https://github.com/Harshvardhan-Asnade/Deepfake-Model) |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
DeepGuard is a robust Deepfake Detection System designed to identify AI-generated images with high precision. It employs an ensemble architecture combining **EfficientNetV2-S** and **Swin Transformer V2-T** with a custom Convolutional Neural Network (CNN) head. This hybrid approach leverages both local feature extraction (CNN) and global context understanding (Transformers) to spot manipulation artifacts often invisible to the human eye. |
|
|
|
|
|
- **Developed by:** Harshvardhan Asnade |
|
|
- **Model type:** Ensemble (EfficientNetV2 + SwinV2 + Custom CNN) |
|
|
- **Language(s):** Python, PyTorch |
|
|
- **License:** MIT |
|
|
- **Finetuned from model:** Torchvision pre-trained weights (ImageNet) |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Repository:** https://github.com/Harshvardhan-Asnade/Deepfake-Model |
|
|
- **Demo:** https://deepfakescan.vercel.app/ (Live Web App) |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
The model is designed to classify single images as either **REAL** or **FAKE**. It outputs a probability score (0.0 - 1.0) and a confidence metric. It is suitable for: |
|
|
- Content moderation |
|
|
- Social media verification |
|
|
- Digital forensics (preliminary analysis) |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
- **Video Analysis:** While it can analyze individual frames, it does not currently leverage temporal coherence in videos (frame-by-frame analysis only). |
|
|
- **Audio Deepfakes:** This model is strictly for visual content. |
|
|
- **Legal Proof:** The model provides a probabilistic assessment and should not be the sole basis for legal judgments. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import torch.nn as nn |
|
|
from torchvision import models |
|
|
from safetensors.torch import load_file |
|
|
import cv2 |
|
|
|
|
|
# Define Model Architecture |
|
|
class DeepfakeDetector(nn.Module): |
|
|
def __init__(self, pretrained=False): |
|
|
super(DeepfakeDetector, self).__init__() |
|
|
self.efficientnet = models.efficientnet_v2_s(weights='DEFAULT' if pretrained else None) |
|
|
self.swin = models.swin_v2_t(weights='DEFAULT' if pretrained else None) |
|
|
|
|
|
self.efficientnet.classifier = nn.Identity() |
|
|
self.swin.head = nn.Identity() |
|
|
|
|
|
self.classifier = nn.Sequential( |
|
|
nn.Linear(1280 + 768, 512), |
|
|
nn.BatchNorm1d(512), |
|
|
nn.ReLU(), |
|
|
nn.Dropout(0.4), |
|
|
nn.Linear(512, 1) |
|
|
) |
|
|
|
|
|
def forward(self, x): |
|
|
f1 = self.efficientnet(x) |
|
|
f2 = self.swin(x) |
|
|
combined = torch.cat((f1, f2), dim=1) |
|
|
return self.classifier(combined) |
|
|
|
|
|
# Load Model |
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
model = DeepfakeDetector(pretrained=False).to(device) |
|
|
state_dict = load_file("best_model.safetensors") |
|
|
model.load_state_dict(state_dict) |
|
|
model.eval() |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was trained on a diverse dataset comprising: |
|
|
- **Real Images:** FFHQ, CelebA-HQ |
|
|
- **Deepfake Images:** Generated using StyleGAN2, Diffusion Models, and FaceSwap techniques. |
|
|
- **Data Augmentation:** extensive augmentation (compression, noise, blur) was applied to robustify the model against social media re-compression artifacts. |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Results |
|
|
|
|
|
The model achieves high accuracy on standard benchmarks: |
|
|
- **Test Accuracy:** ~92-95% (on unseen test split) |
|
|
- **Generalization:** Shows strong resilience to JPEG compression compared to standard CNNs. |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{deepguard2024, |
|
|
author = {Asnade, Harshvardhan}, |
|
|
title = {DeepGuard: Ensemble Deepfake Detection System}, |
|
|
year = {2024}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{https://huggingface.co/Harshasnade/Deepfake_Detection_System_V1}} |
|
|
} |
|
|
``` |
|
|
|