Deepfake Detection β€” Hybrid ViT + DCT (CIFAKE)

Trained checkpoints for a deepfake image detection project using a hybrid Vision Transformer + DCT frequency-domain architecture.

Models

Checkpoint Architecture Test Acc Test AUC-ROC
baseline_resnet50_best.pt ResNet-50 (fine-tuned) β€” β€”
baseline_efficientnet_b4_best.pt EfficientNet-B4 (fine-tuned) β€” β€”
dct_only_main_best.pt DCT-only CNN β€” β€”
hybrid_main_best.pt Hybrid ViT-B/16 + DCT β€” β€”

Dataset

CIFAKE: Real and AI-Generated Synthetic Images β€” 60K real (CIFAR-10) + 60K Stable Diffusion generated images.

Usage

import torch
from models.baseline_cnn import ResNet50Classifier

model = ResNet50Classifier(pretrained=False)
model.load_state_dict(torch.load("baseline_resnet50_best.pt", map_location="cpu"))
model.eval()

Architecture

  • ViT branch: vit_base_patch16_224 (timm), CLS token β†’ 768-dim
  • DCT branch: Block-wise 2D DCT on 8Γ—8 tiles β†’ small CNN β†’ 256-dim
  • Fusion: concat(1024) β†’ LayerNorm β†’ Linear(512) β†’ GELU β†’ Dropout(0.3) β†’ Linear(1)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support