Typing SVG


๐Ÿ” Detect deepfakes with state-of-the-art accuracy

A ResNeXt-101 32ร—8d backbone โ€” initialised from Instagram's weakly-supervised pretrained weights โ€” fine-tuned to expose AI-generated and manipulated faces with high confidence.


  โœ… Real Face          ๐Ÿšจ Deepfake
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€        โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Confidence: 97.3%     Confidence: 99.1%


๐Ÿ“Š Model Overview

๐Ÿ”– Property ๐Ÿ“‹ Value
๐Ÿงฑ Architecture ResNeXt-101 32ร—8d
๐Ÿ–ผ๏ธ Input RGB ยท 224 ร— 224 px
๐ŸŽฏ Task Binary โ€” Real vs Fake
๐Ÿ”ข Parameters ~88 Million
๐Ÿ’พ File size 741.62 MB
๐Ÿ”ฌ Precision float32
โš™๏ธ Framework PyTorch
๐ŸŒ Backbone Instagram WSL pretrained
   Model Pipeline
   โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

   ๐Ÿ“ท Input Image
        โ†“
   ๐Ÿ”ฒ Face Crop & Align
        โ†“
   ๐Ÿงฎ ResNeXt-101
     โ”œโ”€ 88M parameters
     โ”œโ”€ 4 residual stages
     โ””โ”€ 32ร— grouped convs
        โ†“
   ๐ŸŽฏ Classifier Head
        โ†“
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚  โœ…  โ”‚  ๐Ÿšจ  โ”‚
   โ”‚ REAL โ”‚ FAKE โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ—๏ธ Architecture

The backbone uses grouped convolutions with cardinality 32 โ€” each layer splits into 32 parallel transformation paths, then aggregates. This lets the network learn diverse artefact patterns (blending seams, frequency inconsistencies, unnatural textures) simultaneously.

 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚                   ResNeXt-101  32ร—8d                        โ”‚
 โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
 โ”‚                                                             โ”‚
 โ”‚  ๐Ÿ“ท Input  โ”€โ”€โ–ถ  ๐ŸŒฑ STEM                                     โ”‚
 โ”‚                  Conv 7ร—7 โ”‚ BN โ”‚ ReLU โ”‚ MaxPool             โ”‚
 โ”‚                  3 โ†’ 64 channels                            โ”‚
 โ”‚                      โ”‚                                      โ”‚
 โ”‚              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                            โ”‚
 โ”‚              โ”‚   ๐Ÿงฉ LAYER 1    โ”‚  ร—3 blocks ยท ch 256        โ”‚
 โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                            โ”‚
 โ”‚              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                            โ”‚
 โ”‚              โ”‚   ๐Ÿงฉ LAYER 2    โ”‚  ร—4 blocks ยท ch 512        โ”‚
 โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                            โ”‚
 โ”‚              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                            โ”‚
 โ”‚              โ”‚   ๐Ÿงฉ LAYER 3    โ”‚  ร—23 blocks ยท ch 1024  โ—€โ”€โ”€ deepest โ”‚
 โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                            โ”‚
 โ”‚              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                            โ”‚
 โ”‚              โ”‚   ๐Ÿงฉ LAYER 4    โ”‚  ร—3 blocks ยท ch 2048       โ”‚
 โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                            โ”‚
 โ”‚                  Global Avg Pool                            โ”‚
 โ”‚                       โ”‚                                     โ”‚
 โ”‚              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                         โ”‚
 โ”‚              โ”‚  ๐ŸŽฏ FC HEAD       โ”‚  2048 โ†’ num_classes      โ”‚
 โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                         โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  Each bottleneck block:
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  1ร—1 Conv (expand)  โ†’  3ร—3 GroupConv (groups=32) โ”‚
  โ”‚  โ†’  1ร—1 Conv (compress)  +  Skip Connection      โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿš€ Quick Start

1๏ธโƒฃ   Install dependencies

# Clone the repo
git clone https://github.com/accel-reg/deepfake-detection.git
cd deepfake-detection

# Install requirements
pip install -r requirements.txt
๐Ÿ“ฆ What's in requirements.txt?
torch>=1.13
torchvision>=0.14
Pillow
opencv-python
huggingface_hub

2๏ธโƒฃ   Load the model

import torch
from model import DeepfakeDetector   # from the repo

# โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
#  Option A โ€” load local file
# โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
model = DeepfakeDetector()
model.load_state_dict(torch.load("ig.bin", map_location="cpu"))
model.eval()

# โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
#  Option B โ€” pull from HuggingFace ๐Ÿค—
# โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
from huggingface_hub import hf_hub_download

path  = hf_hub_download(repo_id="accel69/depfake-detection", filename="ig.bin")
model = DeepfakeDetector()
model.load_state_dict(torch.load(path, map_location="cpu"))
model.eval()

3๏ธโƒฃ   Preprocess & predict

from torchvision import transforms
from PIL import Image

# โ”€โ”€ Standard ImageNet preprocessing โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std =[0.229, 0.224, 0.225]
    ),
])

# โ”€โ”€ Predict โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
img   = Image.open("face.jpg").convert("RGB")
x     = transform(img).unsqueeze(0)           # โ†’ (1, 3, 224, 224)

with torch.no_grad():
    probs = torch.softmax(model(x), dim=1)
    pred  = probs.argmax(dim=1).item()

label      = "๐Ÿšจ  FAKE" if pred == 1 else "โœ…  REAL"
confidence = probs[0, pred].item()

print(f"  Result     : {label}")
print(f"  Confidence : {confidence:.2%}")

๐Ÿ–ผ๏ธ Inference Examples

๐Ÿ“ Batch inference on a folder
from pathlib import Path

image_dir = Path("frames/")
results   = {"real": 0, "fake": 0}

for img_path in sorted(image_dir.glob("*.jpg")):
    img   = Image.open(img_path).convert("RGB")
    x     = transform(img).unsqueeze(0)

    with torch.no_grad():
        probs = torch.softmax(model(x), dim=1)

    is_fake    = probs.argmax().item() == 1
    confidence = probs.max().item()
    label      = "๐Ÿšจ FAKE" if is_fake else "โœ… REAL"

    results["fake" if is_fake else "real"] += 1
    print(f"  {img_path.name:<35} {label}  ({confidence:.2%})")

print(f"\n  ๐Ÿ“Š Summary โ€” โœ… Real: {results['real']}  |  ๐Ÿšจ Fake: {results['fake']}")
โšก GPU acceleration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model  = model.to(device)

print(f"  ๐Ÿ”ฅ Running on : {device}")
print(f"  โšก CUDA cores : {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A'}")

# Move input to same device
x = x.to(device)

with torch.no_grad():
    probs = torch.softmax(model(x), dim=1)
๐ŸŽฅ Video frame-by-frame analysis
import cv2

cap          = cv2.VideoCapture("video.mp4")
fake_frames  = 0
total_frames = 0

print("  ๐ŸŽฌ Analysing video...")

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    img = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    x   = transform(img).unsqueeze(0)

    with torch.no_grad():
        pred = model(x).argmax(dim=1).item()

    fake_frames  += pred
    total_frames += 1

cap.release()

fake_pct = fake_frames / total_frames
verdict  = "๐Ÿšจ LIKELY DEEPFAKE" if fake_pct > 0.5 else "โœ… LIKELY REAL"

print(f"\n  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”")
print(f"  โ”‚  ๐ŸŽฌ Total frames  : {total_frames:<12}โ”‚")
print(f"  โ”‚  ๐Ÿšจ Fake frames   : {fake_frames:<12}โ”‚")
print(f"  โ”‚  โœ… Real frames   : {total_frames-fake_frames:<12}โ”‚")
print(f"  โ”‚  ๐Ÿ“Š Fake ratio    : {fake_pct:<11.1%} โ”‚")
print(f"  โ”‚  ๐Ÿ Verdict       : {verdict:<12}โ”‚")
print(f"  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜")

๐Ÿ‹๏ธ Training Details

  Training Pipeline
  โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

  ๐Ÿ“ฆ Backbone          Instagram WSL ResNeXt-101 32ร—8d
  ๐Ÿ–ผ๏ธ  Resolution        224 ร— 224 RGB
  ๐Ÿ“ Normalisation     ImageNet mean [0.485 0.456 0.406]
                               std  [0.229 0.224 0.225]
  ๐Ÿ“‰ Loss function     Cross-Entropy
  ๐Ÿ”„ Augmentation      Horizontal flip  ยท  Colour jitter
                       Random crop  ยท  Rotation
โš™๏ธ Hyperparameter ๐Ÿ“‹ Value
๐Ÿงฑ Backbone init Instagram WSL pretrained (WSL-Images)
๐Ÿ“ท Input resolution 224 ร— 224
๐Ÿ“ Normalisation ImageNet mean / std
๐Ÿ“‰ Loss Cross-Entropy
๐Ÿ”„ Augmentations Flip, colour jitter, random crop

๐Ÿ“– Full configs, dataset prep scripts and training logs โ†’ GitHub Repository


โš ๏ธ Limitations

๐Ÿšง Read before deploying in any production or real-world system.

โš ๏ธ Risk ๐Ÿ“‹ Details
๐Ÿ†• Novel forgery methods May not detect unseen GAN/diffusion techniques
๐Ÿ“ Alignment sensitivity Poor face crop โ†’ lower accuracy. Use a dedicated face detector first
๐ŸŒ Distribution shift Different cameras, compression, or lighting may degrade results
โš–๏ธ Demographic bias Not audited across demographic groups โ€” evaluate independently
๐Ÿ” No temporal context Frame-level only โ€” no multi-frame consistency modelling

๐ŸŽฏ Intended Use

โœ… Good uses

  • ๐ŸŽ“ Academic deepfake research
  • ๐Ÿ“ฐ Media integrity & journalism tools
  • ๐Ÿ† Benchmarking forgery detectors
  • ๐Ÿ”ฌ CV/ML research pipelines

โŒ Not intended for

  • ๐Ÿšซ Automated moderation without human review
  • ๐Ÿšซ Surveillance or individual profiling
  • ๐Ÿšซ Legal evidence without expert validation
  • ๐Ÿšซ Any application that could harm individuals

๐Ÿ“œ License & Citation

Released under the MIT License โ€” free to use, modify, and distribute with attribution.

@misc{ig-deepfake-detection-2025,
  author    = {accel69},
  title     = {ig.bin โ€” Deepfake Face Detection with ResNeXt-101 32x8d},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/accel69/depfake-detection}
}

๐Ÿ™ GitHub  ยท  ๐Ÿค— HuggingFace  ยท  MIT License

โญ Star the repo if this helped you!

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support