VIPER / README.md
rxbinsingh's picture
Update README.md
3faf78c verified
|
Raw
History Blame Contribute Delete
6.92 kB
---
license: mit
language:
- en
tags:
- deepfake-detection
- video-classification
- clip
- computer-vision
- pytorch
- face-forensics
- identity-verification
- video-analysis
- insightface
- arcface
pipeline_tag: video-classification
datasets:
- godmodes/rtfs-10k
- hi-paris/FakeParts
- bitmind/FaceForensicsC23
metrics:
- roc_auc
- accuracy
- f1
- precision
- recall
model-index:
- name: VIPER v3
results:
- task:
type: video-classification
name: Deepfake Detection
metrics:
- name: AUC-ROC
type: roc_auc
value: 0.9909
- name: Accuracy
type: accuracy
value: 0.952
- name: F1 (Fake)
type: f1
value: 0.96
- name: Precision (Fake)
type: precision
value: 0.948
- name: Recall (Fake)
type: recall
value: 0.965
---
# VIPER: Video Identity Perturbation and Extraction Residual
<p align="center">
<b>Deepfake detection inspired by displacement reactions in chemistry.</b><br>
<i>A stronger identity signal displaces and exposes synthetic faces.</i>
</p>
<p align="center">
<a href="https://github.com/rxbinsingh/VIPER"><img src="https://img.shields.io/badge/GitHub-Code-blue?logo=github" /></a>
<a href="https://huggingface.co/spaces/rxbinsingh/VIPER"><img src="https://img.shields.io/badge/πŸ€—-Live%20Demo-green" /></a>
<a href="https://colab.research.google.com/github/rxbinsingh/VIPER/blob/main/notebooks/VIPER_Train_Colab.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
</p>
---
![VIPER Banner](assets/Viper_main1.png)
---
## Core Idea
> *What if we could expose deepfakes the way chemistry exposes impurities?*
![Displacement Reaction](assets/Displacement_reaction.png)
```
AB + C β†’ AC + B
AB = video frame (fake face B hidden inside context A)
C = identity anchor (biometric fingerprint from first 8 frames)
AC = anchor bonds with real context β†’ LOW energy = REAL
B = fake face displaced/exposed β†’ HIGH energy = FAKE
```
---
## Results
![Results](assets/VIPER_Results1.png)
| Metric | Value |
|:-------|------:|
| **AUC-ROC** | **0.9909** |
| **Accuracy** | **95.2%** |
| **Fake Recall** | **96.5%** |
| **False Positive Rate** | **6.3%** |
| Face-swap AUC | 0.9931 |
| Expression-swap AUC | 0.9847 |
| Inference speed | ~4s/video (GPU) |
| Training time | 25 min (T4) |
| Training data | 530 videos |
### Per-Manipulation-Type Detection
![Multiple Types](assets/multiple_types.png)
| Attack Type | AUC | Accuracy | N (test) |
|:------------|----:|---------:|--------:|
| Face swap (inswapper) | 0.9931 | 95.6% | 42 |
| Expression transfer (NeuralTextures) | 0.9847 | 93.7% | 15 |
| **All combined** | **0.9909** | **95.2%** | **105** |
### Model Progression
| Version | Backbone | Trainable Params | Test AUC |
|:--------|:---------|:----------------:|---------:|
| v1 | EfficientNet-B4 (frozen) | ~500K | 0.9072 |
| v2 | EfficientNet-B4 (unfrozen) | ~2.3M | 0.9309 |
| **v3** | **CLIP ViT-L/14 (frozen)** | **~500K** | **0.9909** |
---
## Architecture
![Architecture](assets/Viper_Architecture.png)
```
Video β†’ InsightFace β†’ 16 face crops (224Γ—224)
β”‚
β”œβ”€β”€ Identity Anchor β†’ GIR + TFR + BCR β†’ 16-dim features
β”‚
└── CLIP ViT-L/14 (frozen) β†’ 768-dim video embedding
β”‚
β–Ό
Fusion MLP [784 β†’ 512 β†’ 128 β†’ 1] + TTA β†’ REAL / FAKE
```
**Key design:** CLIP backbone entirely frozen. Only 500K-parameter MLP trains. Enables 0.99 AUC from just 530 videos.
### Three Biometric Signals
| Signal | Method | Captures |
|:------:|:-------|:---------|
| **GIR** | ArcFace cosine distance | Skull geometry, eye spacing |
| **TFR** | DCT KL divergence | Skin micro-texture |
| **BCR** | dlib landmark coupling | Facial muscle dynamics |
---
## Confusion Matrix
```
Predicted Real Predicted Fake
Actual Real 45 3
Actual Fake 2 55
```
Only **5 errors** out of 105 test videos.
---
## Usage
```python
import torch
import open_clip
from huggingface_hub import hf_hub_download
import torch.nn as nn
# Download checkpoint
ckpt = hf_hub_download(repo_id="rxbinsingh/VIPER", filename="viper_best_v3_clip.pt")
# Load CLIP
clip_model, _, _ = open_clip.create_model_and_transforms("ViT-L-14", pretrained="openai")
clip_model.eval()
# Model
class VIPERv3(nn.Module):
def __init__(self, clip_visual, dropout=0.4):
super().__init__()
self.clip = clip_visual
for p in self.clip.parameters():
p.requires_grad = False
self.head = nn.Sequential(
nn.Linear(784, 512), nn.BatchNorm1d(512), nn.ReLU(), nn.Dropout(dropout),
nn.Linear(512, 128), nn.BatchNorm1d(128), nn.ReLU(), nn.Dropout(dropout*0.5),
nn.Linear(128, 1))
model = VIPERv3(clip_model.visual)
model.load_state_dict(torch.load(ckpt, map_location="cpu"))
model.eval()
# Input: crops (1, 16, 3, 224, 224), hand_feats (1, 16)
# Output: logit β†’ sigmoid β†’ P(fake)
```
---
## Training Dataset
| Category | Count | Source | License |
|:---------|------:|:-------|:--------|
| Real | 250 | RTFS-10K | CC-BY-SA-4.0 |
| Face swap | 220 | RTFS-10K (inswapper) | CC-BY-SA-4.0 |
| Expression swap | 60 | FaceForensics++ | Academic |
| Full-body GAN | 50 | FakeParts | CC0-1.0 |
| **Total** | **580** | | |
| Usable | 530 | 91.4% success | |
---
## Training Configuration
| Parameter | Value |
|:----------|:------|
| Backbone | CLIP ViT-L/14 (OpenAI, frozen) |
| Classifier | MLP 784β†’512β†’128β†’1 |
| Optimizer | AdamW (lr=3e-4, wd=1e-3) |
| Scheduler | Cosine annealing, 15 epochs |
| Batch size | 8 |
| Loss | BCE with pos_weight=0.758 |
| TTA | Horizontal flip average |
| Hardware | NVIDIA T4 (16GB) |
| Training time | ~25 minutes |
---
## Limitations
- Full-body GAN videos not detectable (face detection fails)
- Analytical signals (GIR/TFR/BCR) independently weak on modern fakes
- Evaluated on 105 test videos β€” larger benchmarks pending
- Not tested against adversarial attacks on CLIP
---
## Citation
```bibtex
@misc{singh2025viper,
title = {VIPER: Deepfake Detection Through Identity-Anchored Visual Representation Analysis},
author = {Singh, Robin},
year = {2025},
url = {https://github.com/rxbinsingh/VIPER}
}
```
---
## Author
**Robin Singh** Β· Bennett University, India
[![GitHub](https://img.shields.io/badge/GitHub-rxbinsingh-black?logo=github)](https://github.com/rxbinsingh)
[![HuggingFace](https://img.shields.io/badge/πŸ€—-rxbinsingh-FFD21E)](https://huggingface.co/rxbinsingh)
[![ResearchGate](https://img.shields.io/badge/ResearchGate-Robin--Singh--61-00CCBB?logo=researchgate)](https://www.researchgate.net/profile/Robin-Singh-61)