---
license: mit
tags:
- pytorch
- autoencoder
- deepfake-detection
- cifar10
- computer-vision
- image-reconstruction
datasets:
- cifar10
metrics:
- mse
library_name: pytorch
---

# Residual Convolutional Autoencoder for Deepfake Detection

## Model Description

This is a **5-stage Residual Convolutional Autoencoder** trained on CIFAR-10 for high-quality image reconstruction. The model achieves exceptional reconstruction quality and can be used as a foundation for deepfake detection systems.

### Architecture

- **Encoder**: 5 downsampling stages (128→64→32→16→8→4) with residual blocks
- **Latent Dimension**: 512
- **Decoder**: 5 upsampling stages with residual blocks
- **Total Parameters**: 34,849,667
- **Input Size**: 128x128x3 (RGB images)
- **Output Range**: [-1, 1] (Tanh activation)

## Training Details

### Training Data
- **Dataset**: CIFAR-10 (50,000 training images, 10,000 test images)
- **Image Size**: Resized to 128x128
- **Normalization**: Mean=0.5, Std=0.5 (range [-1, 1])

### Training Configuration
- **GPU**: NVIDIA H100 80GB HBM3
- **Batch Size**: 1024
- **Optimizer**: AdamW (lr=1e-3, weight_decay=1e-5)
- **Loss Function**: MSE (Mean Squared Error)
- **Scheduler**: ReduceLROnPlateau (factor=0.5, patience=5)
- **Epochs**: 100
- **Training Time**: ~26 minutes

### Training Results
- **Initial Validation Loss**: 0.266256 (Epoch 1)
- **Final Validation Loss**: 0.004294 (Epoch 100)
- **Final Test Loss**: 0.004290
- **Improvement**: 98.4% reduction in loss

## Performance

| Metric | Value |
|--------|-------|
| Test MSE Loss | 0.004290 |
| Training Time | 26.24 minutes |
| GPU Memory | ~40GB peak |
| Throughput | ~3,600 samples/sec |

## Usage

### Loading the Model

```python
import torch
import torch.nn as nn
from huggingface_hub import hf_hub_download

# Define the model architecture
class ResidualBlock(nn.Module):
    def __init__(self, channels):
        super().__init__()
        self.conv1 = nn.Conv2d(channels, channels, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(channels)
        self.conv2 = nn.Conv2d(channels, channels, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(channels)
        self.relu = nn.ReLU(inplace=True)
    
    def forward(self, x):
        residual = x
        out = self.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += residual
        return self.relu(out)

class ResidualConvAutoencoder(nn.Module):
    def __init__(self, latent_dim=512):
        super().__init__()
        
        # Encoder
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 64, 4, stride=2, padding=1),  # 128->64
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            ResidualBlock(64),
            
            nn.Conv2d(64, 128, 4, stride=2, padding=1),  # 64->32
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            ResidualBlock(128),
            
            nn.Conv2d(128, 256, 4, stride=2, padding=1),  # 32->16
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            ResidualBlock(256),
            
            nn.Conv2d(256, 512, 4, stride=2, padding=1),  # 16->8
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            ResidualBlock(512),
            
            nn.Conv2d(512, 512, 4, stride=2, padding=1),  # 8->4
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
        )
        
        self.fc_encoder = nn.Linear(512 * 4 * 4, latent_dim)
        self.fc_decoder = nn.Linear(latent_dim, 512 * 4 * 4)
        
        # Decoder
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(512, 512, 4, stride=2, padding=1),  # 4->8
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            ResidualBlock(512),
            
            nn.ConvTranspose2d(512, 256, 4, stride=2, padding=1),  # 8->16
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            ResidualBlock(256),
            
            nn.ConvTranspose2d(256, 128, 4, stride=2, padding=1),  # 16->32
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            ResidualBlock(128),
            
            nn.ConvTranspose2d(128, 64, 4, stride=2, padding=1),  # 32->64
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            ResidualBlock(64),
            
            nn.ConvTranspose2d(64, 3, 4, stride=2, padding=1),  # 64->128
            nn.Tanh()
        )
    
    def forward(self, x):
        x = self.encoder(x)
        x = x.view(x.size(0), -1)
        latent = self.fc_encoder(x)
        x = self.fc_decoder(latent)
        x = x.view(x.size(0), 512, 4, 4)
        reconstructed = self.decoder(x)
        return reconstructed, latent

# Download and load the model
checkpoint_path = hf_hub_download(
    repo_id="ash12321/deepfake-autoencoder-cifar10-v2",
    filename="model_best_checkpoint.ckpt"
)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = ResidualConvAutoencoder(latent_dim=512).to(device)

checkpoint = torch.load(checkpoint_path, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

print("Model loaded successfully!")
```

### Inference Example

```python
from torchvision import transforms
from PIL import Image

# Prepare image
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

image = Image.open("your_image.jpg").convert('RGB')
input_tensor = transform(image).unsqueeze(0).to(device)

# Get reconstruction
with torch.no_grad():
    reconstructed, latent = model(input_tensor)

# Denormalize for visualization
reconstructed = (reconstructed * 0.5) + 0.5
```

## Reconstruction Examples

![Reconstruction Comparison](reconstruction_comparison.png)

The image above shows 10 original CIFAR-10 test images (top row) and their reconstructions (bottom row), demonstrating the model's excellent reconstruction quality.

## Applications

- **Deepfake Detection**: Use reconstruction error as a signal for detecting manipulated images
- **Anomaly Detection**: Identify out-of-distribution images based on reconstruction quality
- **Image Compression**: Compress images to 512-dimensional latent vectors
- **Feature Extraction**: Use the encoder as a feature extractor for downstream tasks
- **Image Denoising**: Potential for removing noise through reconstruction

## Limitations

- Trained specifically on CIFAR-10 (32x32 images upscaled to 128x128)
- May not generalize well to real-world high-resolution images without fine-tuning
- Optimized for natural images; performance on synthetic/generated images varies
- Reconstruction quality degrades for images significantly different from CIFAR-10 distribution

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{deepfake-autoencoder-cifar10-v2,
  author = {ash12321},
  title = {Residual Convolutional Autoencoder for Deepfake Detection},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}}
}
```

## Model Card Authors

- **ash12321**

## Model Card Contact

For questions or issues, please open an issue in the repository.

---

*Model trained on December 08, 2025*