File size: 3,921 Bytes
386408b
365f1e7
 
0e82ad7
386408b
 
0e82ad7
 
365f1e7
386408b
 
0e82ad7
 
 
 
 
7fdab2a
0e82ad7
365f1e7
 
7fdab2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
365f1e7
0e82ad7
 
 
 
 
 
 
 
386408b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7fdab2a
386408b
 
7fdab2a
386408b
 
 
 
7fdab2a
386408b
 
7fdab2a
b8dafec
386408b
7fdab2a
386408b
 
b8dafec
 
 
 
386408b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
language: en
library_name: pytorch
license: mit
tags:
- deepfake-detection
- image-classification
- video-analysis
- efficientvit
- pytorch
pipeline_tag: image-classification

safetensors:
  total: 1
  format: safetensors
  weight_dtype: float32
  size_in_bytes: 80000000   

model-index:
- name: Deepfake Detection with Improved EfficientViT
  results:
  - task:
      type: image-classification
      name: Deepfake Detection
    dataset:
      type: custom
      name: FaceForensics++,Celeb-DF
    metrics:
      - name: Accuracy
        type: accuracy
        value: 0.8864
      - name: Precision
        type: precision
        value: 0.8920
      - name: Recall
        type: recall
        value: 0.8792
      - name: F1-score
        type: f1
        value: 0.8856

  config: config.json
  metadata:
    model_type: EfficientViT
    num_parameters: 20026725
    precision: float32
    framework: pytorch
    license: mit
    model_format: safetensors
    size: 82MB
---

# Deepfake Detection with Improved EfficientViT

## Model Architecture

![Model Architecture](assets/architecture.png)

## Inference Pipeline

![Inference Pipeline](assets/inference_pipeline.png)


This repository contains a **PyTorch model for deepfake detection** based on an improved **EfficientViT** architecture, trained on video data.  

The model predicts whether a video is **real (0)** or **fake (1)** using both visual information and temporal cues.

---

## 🧩 Model Description

**Architecture:** Improved EfficientViT  
**Backbone:** EfficientNet-B0 for feature extraction  
**Head:** Transformer-based temporal modeling with classification head  
**Input:** Video frames (224Γ—224 RGB images)  
**Output:** Binary label (0=Real, 1=Fake) and frame-level probabilities  

**Key Features:**

- Extracts faces from frames using MTCNN  
- Supports inference on raw video files  
- Provides frame-level probabilities for fine-grained analysis  

---

## πŸ“ Repository Structure

```
deepfake-efficientvit/
β”‚
β”œβ”€β”€ model.py                  # ImprovedEfficientViT class
β”œβ”€β”€ inference.py              # Functions to run inference on videos
β”œβ”€β”€ model.pth  # Trained weights
β”œβ”€β”€ config.json               # Optional model metadata
β”œβ”€β”€ requirements.txt          # Required packages
β”œβ”€β”€ README.md

```

## ⚑ Installation
git clone https://huggingface.co/faisalishfaq2005/deepfake-detection-efficientnet-vit

cd deepfake-detection-efficientnet-vit

pip install -r requirements.txt

## πŸš€ Usage
# 1.Programmatic Inference

```python

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import torch
from model import ImprovedEfficientViT
from inference import predict_vedio 

# 1️⃣ Download the checkpoint from Hugging Face
checkpoint_path = hf_hub_download(
    repo_id="faisalishfaq2005/deepfake-detection-efficientnet-vit",  
    filename="model.safetensors"
)

# 2️⃣ Load the model weights safely
state_dict = load_file(checkpoint_path, device="cpu")
model = ImprovedEfficientViT()
model.load_state_dict(state_dict)
model.eval()

# 4️⃣ Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# 3️⃣ Run inference on a video
video_path = "sample_video.mp4"
result = predict_vedio(video_path, model)
print(result)
# Example Output: {'class': 1}

```
# 2. Manual Download

Go to the Hugging Face model page

Download:

model.pth

model.py

inference.py

Place them in the same folder locally.

Install requirements and run predict_video().

## πŸ“„ License

This model is released under the MIT License.
You are free to use, modify, and distribute it, with attribution.

## πŸ“š Citation

If you use this model in your research, please cite:

```bibtex
@inproceedings{faisalishfaq2025efficientvit,
  title={Deepfake Detection with Efficientnet and ViT},
  author={Faisal Ishfaq},
  year={2025}
}
```