File size: 3,921 Bytes

---
language: en
library_name: pytorch
license: mit
tags:
- deepfake-detection
- image-classification
- video-analysis
- efficientvit
- pytorch
pipeline_tag: image-classification

safetensors:
  total: 1
  format: safetensors
  weight_dtype: float32
  size_in_bytes: 80000000   

model-index:
- name: Deepfake Detection with Improved EfficientViT
  results:
  - task:
      type: image-classification
      name: Deepfake Detection
    dataset:
      type: custom
      name: FaceForensics++,Celeb-DF
    metrics:
      - name: Accuracy
        type: accuracy
        value: 0.8864
      - name: Precision
        type: precision
        value: 0.8920
      - name: Recall
        type: recall
        value: 0.8792
      - name: F1-score
        type: f1
        value: 0.8856

  config: config.json
  metadata:
    model_type: EfficientViT
    num_parameters: 20026725
    precision: float32
    framework: pytorch
    license: mit
    model_format: safetensors
    size: 82MB
---

# Deepfake Detection with Improved EfficientViT

## Model Architecture

![Model Architecture](assets/architecture.png)

## Inference Pipeline

![Inference Pipeline](assets/inference_pipeline.png)


This repository contains a **PyTorch model for deepfake detection** based on an improved **EfficientViT** architecture, trained on video data.  

The model predicts whether a video is **real (0)** or **fake (1)** using both visual information and temporal cues.

---

## 🧩 Model Description

**Architecture:** Improved EfficientViT  
**Backbone:** EfficientNet-B0 for feature extraction  
**Head:** Transformer-based temporal modeling with classification head  
**Input:** Video frames (224×224 RGB images)  
**Output:** Binary label (0=Real, 1=Fake) and frame-level probabilities  

**Key Features:**

- Extracts faces from frames using MTCNN  
- Supports inference on raw video files  
- Provides frame-level probabilities for fine-grained analysis  

---

## 📁 Repository Structure

```
deepfake-efficientvit/
│
├── model.py                  # ImprovedEfficientViT class
├── inference.py              # Functions to run inference on videos
├── model.pth  # Trained weights
├── config.json               # Optional model metadata
├── requirements.txt          # Required packages
├── README.md

```

## ⚡ Installation
git clone https://huggingface.co/faisalishfaq2005/deepfake-detection-efficientnet-vit

cd deepfake-detection-efficientnet-vit

pip install -r requirements.txt

## 🚀 Usage
# 1.Programmatic Inference

```python

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import torch
from model import ImprovedEfficientViT
from inference import predict_vedio 

# 1️⃣ Download the checkpoint from Hugging Face
checkpoint_path = hf_hub_download(
    repo_id="faisalishfaq2005/deepfake-detection-efficientnet-vit",  
    filename="model.safetensors"
)

# 2️⃣ Load the model weights safely
state_dict = load_file(checkpoint_path, device="cpu")
model = ImprovedEfficientViT()
model.load_state_dict(state_dict)
model.eval()

# 4️⃣ Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# 3️⃣ Run inference on a video
video_path = "sample_video.mp4"
result = predict_vedio(video_path, model)
print(result)
# Example Output: {'class': 1}

```
# 2. Manual Download

Go to the Hugging Face model page

Download:

model.pth

model.py

inference.py

Place them in the same folder locally.

Install requirements and run predict_video().

## 📄 License

This model is released under the MIT License.
You are free to use, modify, and distribute it, with attribution.

## 📚 Citation

If you use this model in your research, please cite:

```bibtex
@inproceedings{faisalishfaq2025efficientvit,
  title={Deepfake Detection with Efficientnet and ViT},
  author={Faisal Ishfaq},
  year={2025}
}
```