ash12321's picture
Upload README.md with huggingface_hub
716ac9d verified
|
raw
history blame
7.35 kB
metadata
license: mit
tags:
  - pytorch
  - autoencoder
  - deepfake-detection
  - cifar10
  - computer-vision
  - image-reconstruction
datasets:
  - cifar10
metrics:
  - mse
library_name: pytorch

Residual Convolutional Autoencoder for Deepfake Detection

Model Description

This is a 5-stage Residual Convolutional Autoencoder trained on CIFAR-10 for high-quality image reconstruction. The model achieves exceptional reconstruction quality and can be used as a foundation for deepfake detection systems.

Architecture

  • Encoder: 5 downsampling stages (128→64→32→16→8→4) with residual blocks
  • Latent Dimension: 512
  • Decoder: 5 upsampling stages with residual blocks
  • Total Parameters: 34,849,667
  • Input Size: 128x128x3 (RGB images)
  • Output Range: [-1, 1] (Tanh activation)

Training Details

Training Data

  • Dataset: CIFAR-10 (50,000 training images, 10,000 test images)
  • Image Size: Resized to 128x128
  • Normalization: Mean=0.5, Std=0.5 (range [-1, 1])

Training Configuration

  • GPU: NVIDIA H100 80GB HBM3
  • Batch Size: 1024
  • Optimizer: AdamW (lr=1e-3, weight_decay=1e-5)
  • Loss Function: MSE (Mean Squared Error)
  • Scheduler: ReduceLROnPlateau (factor=0.5, patience=5)
  • Epochs: 100
  • Training Time: ~26 minutes

Training Results

  • Initial Validation Loss: 0.266256 (Epoch 1)
  • Final Validation Loss: 0.004294 (Epoch 100)
  • Final Test Loss: 0.004290
  • Improvement: 98.4% reduction in loss

Performance

Metric Value
Test MSE Loss 0.004290
Training Time 26.24 minutes
GPU Memory ~40GB peak
Throughput ~3,600 samples/sec

Usage

Loading the Model

import torch
import torch.nn as nn
from huggingface_hub import hf_hub_download

# Define the model architecture
class ResidualBlock(nn.Module):
    def __init__(self, channels):
        super().__init__()
        self.conv1 = nn.Conv2d(channels, channels, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(channels)
        self.conv2 = nn.Conv2d(channels, channels, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(channels)
        self.relu = nn.ReLU(inplace=True)
    
    def forward(self, x):
        residual = x
        out = self.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += residual
        return self.relu(out)

class ResidualConvAutoencoder(nn.Module):
    def __init__(self, latent_dim=512):
        super().__init__()
        
        # Encoder
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 64, 4, stride=2, padding=1),  # 128->64
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            ResidualBlock(64),
            
            nn.Conv2d(64, 128, 4, stride=2, padding=1),  # 64->32
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            ResidualBlock(128),
            
            nn.Conv2d(128, 256, 4, stride=2, padding=1),  # 32->16
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            ResidualBlock(256),
            
            nn.Conv2d(256, 512, 4, stride=2, padding=1),  # 16->8
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            ResidualBlock(512),
            
            nn.Conv2d(512, 512, 4, stride=2, padding=1),  # 8->4
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
        )
        
        self.fc_encoder = nn.Linear(512 * 4 * 4, latent_dim)
        self.fc_decoder = nn.Linear(latent_dim, 512 * 4 * 4)
        
        # Decoder
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(512, 512, 4, stride=2, padding=1),  # 4->8
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            ResidualBlock(512),
            
            nn.ConvTranspose2d(512, 256, 4, stride=2, padding=1),  # 8->16
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            ResidualBlock(256),
            
            nn.ConvTranspose2d(256, 128, 4, stride=2, padding=1),  # 16->32
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            ResidualBlock(128),
            
            nn.ConvTranspose2d(128, 64, 4, stride=2, padding=1),  # 32->64
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            ResidualBlock(64),
            
            nn.ConvTranspose2d(64, 3, 4, stride=2, padding=1),  # 64->128
            nn.Tanh()
        )
    
    def forward(self, x):
        x = self.encoder(x)
        x = x.view(x.size(0), -1)
        latent = self.fc_encoder(x)
        x = self.fc_decoder(latent)
        x = x.view(x.size(0), 512, 4, 4)
        reconstructed = self.decoder(x)
        return reconstructed, latent

# Download and load the model
checkpoint_path = hf_hub_download(
    repo_id="ash12321/deepfake-autoencoder-cifar10-v2",
    filename="model_best_checkpoint.ckpt"
)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = ResidualConvAutoencoder(latent_dim=512).to(device)

checkpoint = torch.load(checkpoint_path, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

print("Model loaded successfully!")

Inference Example

from torchvision import transforms
from PIL import Image

# Prepare image
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

image = Image.open("your_image.jpg").convert('RGB')
input_tensor = transform(image).unsqueeze(0).to(device)

# Get reconstruction
with torch.no_grad():
    reconstructed, latent = model(input_tensor)

# Denormalize for visualization
reconstructed = (reconstructed * 0.5) + 0.5

Reconstruction Examples

Reconstruction Comparison

The image above shows 10 original CIFAR-10 test images (top row) and their reconstructions (bottom row), demonstrating the model's excellent reconstruction quality.

Applications

  • Deepfake Detection: Use reconstruction error as a signal for detecting manipulated images
  • Anomaly Detection: Identify out-of-distribution images based on reconstruction quality
  • Image Compression: Compress images to 512-dimensional latent vectors
  • Feature Extraction: Use the encoder as a feature extractor for downstream tasks
  • Image Denoising: Potential for removing noise through reconstruction

Limitations

  • Trained specifically on CIFAR-10 (32x32 images upscaled to 128x128)
  • May not generalize well to real-world high-resolution images without fine-tuning
  • Optimized for natural images; performance on synthetic/generated images varies
  • Reconstruction quality degrades for images significantly different from CIFAR-10 distribution

Citation

If you use this model in your research, please cite:

@misc{deepfake-autoencoder-cifar10-v2,
  author = {ash12321},
  title = {Residual Convolutional Autoencoder for Deepfake Detection},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}}
}

Model Card Authors

  • ash12321

Model Card Contact

For questions or issues, please open an issue in the repository.


Model trained on December 08, 2025