🔍 Detect deepfakes with state-of-the-art accuracy

A ResNeXt-101 32×8d backbone — initialised from Instagram's weakly-supervised pretrained weights — fine-tuned to expose AI-generated and manipulated faces with high confidence.

  ✅ Real Face          🚨 Deepfake
  ──────────────        ──────────────
  Confidence: 97.3%     Confidence: 99.1%

🗺️ Navigation

📊 Overview · 🏗️ Architecture · 🚀 Quick Start · 🖼️ Examples · 🏋️ Training · ⚠️ Limitations · 📜 Cite

📊 Model Overview

🔖 Property	📋 Value
🧱 Architecture	ResNeXt-101 32×8d
🖼️ Input	RGB · 224 × 224 px
🎯 Task	Binary — Real vs Fake
🔢 Parameters	~88 Million
💾 File size	741.62 MB
🔬 Precision	float32
⚙️ Framework	PyTorch
🌐 Backbone	Instagram WSL pretrained

   Model Pipeline
   ══════════════

   📷 Input Image
        ↓
   🔲 Face Crop & Align
        ↓
   🧮 ResNeXt-101
     ├─ 88M parameters
     ├─ 4 residual stages
     └─ 32× grouped convs
        ↓
   🎯 Classifier Head
        ↓
   ┌──────┬──────┐
   │  ✅  │  🚨  │
   │ REAL │ FAKE │
   └──────┴──────┘

🏗️ Architecture

The backbone uses grouped convolutions with cardinality 32 — each layer splits into 32 parallel transformation paths, then aggregates. This lets the network learn diverse artefact patterns (blending seams, frequency inconsistencies, unnatural textures) simultaneously.

 ┌─────────────────────────────────────────────────────────────┐
 │                   ResNeXt-101  32×8d                        │
 ├─────────────────────────────────────────────────────────────┤
 │                                                             │
 │  📷 Input  ──▶  🌱 STEM                                     │
 │                  Conv 7×7 │ BN │ ReLU │ MaxPool             │
 │                  3 → 64 channels                            │
 │                      │                                      │
 │              ┌────────▼────────┐                            │
 │              │   🧩 LAYER 1    │  ×3 blocks · ch 256        │
 │              └────────┬────────┘                            │
 │              ┌────────▼────────┐                            │
 │              │   🧩 LAYER 2    │  ×4 blocks · ch 512        │
 │              └────────┬────────┘                            │
 │              ┌────────▼────────┐                            │
 │              │   🧩 LAYER 3    │  ×23 blocks · ch 1024  ◀── deepest │
 │              └────────┬────────┘                            │
 │              ┌────────▼────────┐                            │
 │              │   🧩 LAYER 4    │  ×3 blocks · ch 2048       │
 │              └────────┬────────┘                            │
 │                  Global Avg Pool                            │
 │                       │                                     │
 │              ┌─────────▼─────────┐                         │
 │              │  🎯 FC HEAD       │  2048 → num_classes      │
 │              └───────────────────┘                         │
 └─────────────────────────────────────────────────────────────┘

  Each bottleneck block:
  ┌──────────────────────────────────────────────────┐
  │  1×1 Conv (expand)  →  3×3 GroupConv (groups=32) │
  │  →  1×1 Conv (compress)  +  Skip Connection      │
  └──────────────────────────────────────────────────┘

🚀 Quick Start

1️⃣ Install dependencies

# Clone the repo
git clone https://github.com/accel-reg/deepfake-detection.git
cd deepfake-detection

# Install requirements
pip install -r requirements.txt

📦 What's in requirements.txt?

torch>=1.13
torchvision>=0.14
Pillow
opencv-python
huggingface_hub

2️⃣ Load the model

import torch
from model import DeepfakeDetector   # from the repo

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
#  Option A — load local file
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
model = DeepfakeDetector()
model.load_state_dict(torch.load("ig.bin", map_location="cpu"))
model.eval()

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
#  Option B — pull from HuggingFace 🤗
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
from huggingface_hub import hf_hub_download

path  = hf_hub_download(repo_id="accel69/depfake-detection", filename="ig.bin")
model = DeepfakeDetector()
model.load_state_dict(torch.load(path, map_location="cpu"))
model.eval()

3️⃣ Preprocess & predict

from torchvision import transforms
from PIL import Image

# ── Standard ImageNet preprocessing ───────────────────────────
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std =[0.229, 0.224, 0.225]
    ),
])

# ── Predict ────────────────────────────────────────────────────
img   = Image.open("face.jpg").convert("RGB")
x     = transform(img).unsqueeze(0)           # → (1, 3, 224, 224)

with torch.no_grad():
    probs = torch.softmax(model(x), dim=1)
    pred  = probs.argmax(dim=1).item()

label      = "🚨  FAKE" if pred == 1 else "✅  REAL"
confidence = probs[0, pred].item()

print(f"  Result     : {label}")
print(f"  Confidence : {confidence:.2%}")

🖼️ Inference Examples

📁 Batch inference on a folder

from pathlib import Path

image_dir = Path("frames/")
results   = {"real": 0, "fake": 0}

for img_path in sorted(image_dir.glob("*.jpg")):
    img   = Image.open(img_path).convert("RGB")
    x     = transform(img).unsqueeze(0)

    with torch.no_grad():
        probs = torch.softmax(model(x), dim=1)

    is_fake    = probs.argmax().item() == 1
    confidence = probs.max().item()
    label      = "🚨 FAKE" if is_fake else "✅ REAL"

    results["fake" if is_fake else "real"] += 1
    print(f"  {img_path.name:<35} {label}  ({confidence:.2%})")

print(f"\n  📊 Summary — ✅ Real: {results['real']}  |  🚨 Fake: {results['fake']}")

⚡ GPU acceleration

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model  = model.to(device)

print(f"  🔥 Running on : {device}")
print(f"  ⚡ CUDA cores : {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A'}")

# Move input to same device
x = x.to(device)

with torch.no_grad():
    probs = torch.softmax(model(x), dim=1)

🎥 Video frame-by-frame analysis

import cv2

cap          = cv2.VideoCapture("video.mp4")
fake_frames  = 0
total_frames = 0

print("  🎬 Analysing video...")

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    img = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    x   = transform(img).unsqueeze(0)

    with torch.no_grad():
        pred = model(x).argmax(dim=1).item()

    fake_frames  += pred
    total_frames += 1

cap.release()

fake_pct = fake_frames / total_frames
verdict  = "🚨 LIKELY DEEPFAKE" if fake_pct > 0.5 else "✅ LIKELY REAL"

print(f"\n  ┌─────────────────────────────────┐")
print(f"  │  🎬 Total frames  : {total_frames:<12}│")
print(f"  │  🚨 Fake frames   : {fake_frames:<12}│")
print(f"  │  ✅ Real frames   : {total_frames-fake_frames:<12}│")
print(f"  │  📊 Fake ratio    : {fake_pct:<11.1%} │")
print(f"  │  🏁 Verdict       : {verdict:<12}│")
print(f"  └─────────────────────────────────┘")

🏋️ Training Details

  Training Pipeline
  ══════════════════════════════════════════════════════

  📦 Backbone          Instagram WSL ResNeXt-101 32×8d
  🖼️  Resolution        224 × 224 RGB
  📐 Normalisation     ImageNet mean [0.485 0.456 0.406]
                               std  [0.229 0.224 0.225]
  📉 Loss function     Cross-Entropy
  🔄 Augmentation      Horizontal flip  ·  Colour jitter
                       Random crop  ·  Rotation

⚙️ Hyperparameter	📋 Value
🧱 Backbone init	Instagram WSL pretrained (WSL-Images)
📷 Input resolution	224 × 224
📐 Normalisation	ImageNet mean / std
📉 Loss	Cross-Entropy
🔄 Augmentations	Flip, colour jitter, random crop

📖 Full configs, dataset prep scripts and training logs → GitHub Repository

⚠️ Limitations

🚧 Read before deploying in any production or real-world system.

⚠️ Risk	📋 Details
🆕 Novel forgery methods	May not detect unseen GAN/diffusion techniques
📐 Alignment sensitivity	Poor face crop → lower accuracy. Use a dedicated face detector first
🌍 Distribution shift	Different cameras, compression, or lighting may degrade results
⚖️ Demographic bias	Not audited across demographic groups — evaluate independently
🔁 No temporal context	Frame-level only — no multi-frame consistency modelling

🎯 Intended Use

✅ Good uses

🎓 Academic deepfake research
📰 Media integrity & journalism tools
🏆 Benchmarking forgery detectors
🔬 CV/ML research pipelines

❌ Not intended for

🚫 Automated moderation without human review
🚫 Surveillance or individual profiling
🚫 Legal evidence without expert validation
🚫 Any application that could harm individuals

📜 License & Citation

Released under the MIT License — free to use, modify, and distribute with attribution.

@misc{ig-deepfake-detection-2025,
  author    = {accel69},
  title     = {ig.bin — Deepfake Face Detection with ResNeXt-101 32x8d},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/accel69/depfake-detection}
}

🐙 GitHub · 🤗 HuggingFace · MIT License

_{⭐ Star the repo if this helped you!}

Downloads last month: -; Downloads are not tracked for this model. How to track