|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- pytorch |
|
|
- autoencoder |
|
|
- deepfake-detection |
|
|
- cifar10 |
|
|
- computer-vision |
|
|
- image-reconstruction |
|
|
datasets: |
|
|
- cifar10 |
|
|
metrics: |
|
|
- mse |
|
|
library_name: pytorch |
|
|
--- |
|
|
|
|
|
# Residual Convolutional Autoencoder for Deepfake Detection |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This is a **5-stage Residual Convolutional Autoencoder** trained on CIFAR-10 for high-quality image reconstruction. The model achieves exceptional reconstruction quality and can be used as a foundation for deepfake detection systems. |
|
|
|
|
|
### Architecture |
|
|
|
|
|
- **Encoder**: 5 downsampling stages (128→64→32→16→8→4) with residual blocks |
|
|
- **Latent Dimension**: 512 |
|
|
- **Decoder**: 5 upsampling stages with residual blocks |
|
|
- **Total Parameters**: 34,849,667 |
|
|
- **Input Size**: 128x128x3 (RGB images) |
|
|
- **Output Range**: [-1, 1] (Tanh activation) |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
- **Dataset**: CIFAR-10 (50,000 training images, 10,000 test images) |
|
|
- **Image Size**: Resized to 128x128 |
|
|
- **Normalization**: Mean=0.5, Std=0.5 (range [-1, 1]) |
|
|
|
|
|
### Training Configuration |
|
|
- **GPU**: NVIDIA H100 80GB HBM3 |
|
|
- **Batch Size**: 1024 |
|
|
- **Optimizer**: AdamW (lr=1e-3, weight_decay=1e-5) |
|
|
- **Loss Function**: MSE (Mean Squared Error) |
|
|
- **Scheduler**: ReduceLROnPlateau (factor=0.5, patience=5) |
|
|
- **Epochs**: 100 |
|
|
- **Training Time**: ~26 minutes |
|
|
|
|
|
### Training Results |
|
|
- **Initial Validation Loss**: 0.266256 (Epoch 1) |
|
|
- **Final Validation Loss**: 0.004294 (Epoch 100) |
|
|
- **Final Test Loss**: 0.004290 |
|
|
- **Improvement**: 98.4% reduction in loss |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| Test MSE Loss | 0.004290 | |
|
|
| Training Time | 26.24 minutes | |
|
|
| GPU Memory | ~40GB peak | |
|
|
| Throughput | ~3,600 samples/sec | |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Loading the Model |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import torch.nn as nn |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Define the model architecture |
|
|
class ResidualBlock(nn.Module): |
|
|
def __init__(self, channels): |
|
|
super().__init__() |
|
|
self.conv1 = nn.Conv2d(channels, channels, 3, padding=1) |
|
|
self.bn1 = nn.BatchNorm2d(channels) |
|
|
self.conv2 = nn.Conv2d(channels, channels, 3, padding=1) |
|
|
self.bn2 = nn.BatchNorm2d(channels) |
|
|
self.relu = nn.ReLU(inplace=True) |
|
|
|
|
|
def forward(self, x): |
|
|
residual = x |
|
|
out = self.relu(self.bn1(self.conv1(x))) |
|
|
out = self.bn2(self.conv2(out)) |
|
|
out += residual |
|
|
return self.relu(out) |
|
|
|
|
|
class ResidualConvAutoencoder(nn.Module): |
|
|
def __init__(self, latent_dim=512): |
|
|
super().__init__() |
|
|
|
|
|
# Encoder |
|
|
self.encoder = nn.Sequential( |
|
|
nn.Conv2d(3, 64, 4, stride=2, padding=1), # 128->64 |
|
|
nn.BatchNorm2d(64), |
|
|
nn.ReLU(inplace=True), |
|
|
ResidualBlock(64), |
|
|
|
|
|
nn.Conv2d(64, 128, 4, stride=2, padding=1), # 64->32 |
|
|
nn.BatchNorm2d(128), |
|
|
nn.ReLU(inplace=True), |
|
|
ResidualBlock(128), |
|
|
|
|
|
nn.Conv2d(128, 256, 4, stride=2, padding=1), # 32->16 |
|
|
nn.BatchNorm2d(256), |
|
|
nn.ReLU(inplace=True), |
|
|
ResidualBlock(256), |
|
|
|
|
|
nn.Conv2d(256, 512, 4, stride=2, padding=1), # 16->8 |
|
|
nn.BatchNorm2d(512), |
|
|
nn.ReLU(inplace=True), |
|
|
ResidualBlock(512), |
|
|
|
|
|
nn.Conv2d(512, 512, 4, stride=2, padding=1), # 8->4 |
|
|
nn.BatchNorm2d(512), |
|
|
nn.ReLU(inplace=True), |
|
|
) |
|
|
|
|
|
self.fc_encoder = nn.Linear(512 * 4 * 4, latent_dim) |
|
|
self.fc_decoder = nn.Linear(latent_dim, 512 * 4 * 4) |
|
|
|
|
|
# Decoder |
|
|
self.decoder = nn.Sequential( |
|
|
nn.ConvTranspose2d(512, 512, 4, stride=2, padding=1), # 4->8 |
|
|
nn.BatchNorm2d(512), |
|
|
nn.ReLU(inplace=True), |
|
|
ResidualBlock(512), |
|
|
|
|
|
nn.ConvTranspose2d(512, 256, 4, stride=2, padding=1), # 8->16 |
|
|
nn.BatchNorm2d(256), |
|
|
nn.ReLU(inplace=True), |
|
|
ResidualBlock(256), |
|
|
|
|
|
nn.ConvTranspose2d(256, 128, 4, stride=2, padding=1), # 16->32 |
|
|
nn.BatchNorm2d(128), |
|
|
nn.ReLU(inplace=True), |
|
|
ResidualBlock(128), |
|
|
|
|
|
nn.ConvTranspose2d(128, 64, 4, stride=2, padding=1), # 32->64 |
|
|
nn.BatchNorm2d(64), |
|
|
nn.ReLU(inplace=True), |
|
|
ResidualBlock(64), |
|
|
|
|
|
nn.ConvTranspose2d(64, 3, 4, stride=2, padding=1), # 64->128 |
|
|
nn.Tanh() |
|
|
) |
|
|
|
|
|
def forward(self, x): |
|
|
x = self.encoder(x) |
|
|
x = x.view(x.size(0), -1) |
|
|
latent = self.fc_encoder(x) |
|
|
x = self.fc_decoder(latent) |
|
|
x = x.view(x.size(0), 512, 4, 4) |
|
|
reconstructed = self.decoder(x) |
|
|
return reconstructed, latent |
|
|
|
|
|
# Download and load the model |
|
|
checkpoint_path = hf_hub_download( |
|
|
repo_id="ash12321/deepfake-autoencoder-cifar10-v2", |
|
|
filename="model_best_checkpoint.ckpt" |
|
|
) |
|
|
|
|
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
|
|
model = ResidualConvAutoencoder(latent_dim=512).to(device) |
|
|
|
|
|
checkpoint = torch.load(checkpoint_path, map_location=device) |
|
|
model.load_state_dict(checkpoint['model_state_dict']) |
|
|
model.eval() |
|
|
|
|
|
print("Model loaded successfully!") |
|
|
``` |
|
|
|
|
|
### Inference Example |
|
|
|
|
|
```python |
|
|
from torchvision import transforms |
|
|
from PIL import Image |
|
|
|
|
|
# Prepare image |
|
|
transform = transforms.Compose([ |
|
|
transforms.Resize((128, 128)), |
|
|
transforms.ToTensor(), |
|
|
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) |
|
|
]) |
|
|
|
|
|
image = Image.open("your_image.jpg").convert('RGB') |
|
|
input_tensor = transform(image).unsqueeze(0).to(device) |
|
|
|
|
|
# Get reconstruction |
|
|
with torch.no_grad(): |
|
|
reconstructed, latent = model(input_tensor) |
|
|
|
|
|
# Denormalize for visualization |
|
|
reconstructed = (reconstructed * 0.5) + 0.5 |
|
|
``` |
|
|
|
|
|
## Reconstruction Examples |
|
|
|
|
|
 |
|
|
|
|
|
The image above shows 10 original CIFAR-10 test images (top row) and their reconstructions (bottom row), demonstrating the model's excellent reconstruction quality. |
|
|
|
|
|
## Applications |
|
|
|
|
|
- **Deepfake Detection**: Use reconstruction error as a signal for detecting manipulated images |
|
|
- **Anomaly Detection**: Identify out-of-distribution images based on reconstruction quality |
|
|
- **Image Compression**: Compress images to 512-dimensional latent vectors |
|
|
- **Feature Extraction**: Use the encoder as a feature extractor for downstream tasks |
|
|
- **Image Denoising**: Potential for removing noise through reconstruction |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Trained specifically on CIFAR-10 (32x32 images upscaled to 128x128) |
|
|
- May not generalize well to real-world high-resolution images without fine-tuning |
|
|
- Optimized for natural images; performance on synthetic/generated images varies |
|
|
- Reconstruction quality degrades for images significantly different from CIFAR-10 distribution |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{deepfake-autoencoder-cifar10-v2, |
|
|
author = {ash12321}, |
|
|
title = {Residual Convolutional Autoencoder for Deepfake Detection}, |
|
|
year = {2024}, |
|
|
publisher = {HuggingFace}, |
|
|
howpublished = {\url{https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
- **ash12321** |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
For questions or issues, please open an issue in the repository. |
|
|
|
|
|
--- |
|
|
|
|
|
*Model trained on December 08, 2025* |
|
|
|