ash12321's picture
Upload README.md with huggingface_hub
716ac9d verified
|
raw
history blame
7.35 kB
---
license: mit
tags:
- pytorch
- autoencoder
- deepfake-detection
- cifar10
- computer-vision
- image-reconstruction
datasets:
- cifar10
metrics:
- mse
library_name: pytorch
---
# Residual Convolutional Autoencoder for Deepfake Detection
## Model Description
This is a **5-stage Residual Convolutional Autoencoder** trained on CIFAR-10 for high-quality image reconstruction. The model achieves exceptional reconstruction quality and can be used as a foundation for deepfake detection systems.
### Architecture
- **Encoder**: 5 downsampling stages (128→64→32→16→8→4) with residual blocks
- **Latent Dimension**: 512
- **Decoder**: 5 upsampling stages with residual blocks
- **Total Parameters**: 34,849,667
- **Input Size**: 128x128x3 (RGB images)
- **Output Range**: [-1, 1] (Tanh activation)
## Training Details
### Training Data
- **Dataset**: CIFAR-10 (50,000 training images, 10,000 test images)
- **Image Size**: Resized to 128x128
- **Normalization**: Mean=0.5, Std=0.5 (range [-1, 1])
### Training Configuration
- **GPU**: NVIDIA H100 80GB HBM3
- **Batch Size**: 1024
- **Optimizer**: AdamW (lr=1e-3, weight_decay=1e-5)
- **Loss Function**: MSE (Mean Squared Error)
- **Scheduler**: ReduceLROnPlateau (factor=0.5, patience=5)
- **Epochs**: 100
- **Training Time**: ~26 minutes
### Training Results
- **Initial Validation Loss**: 0.266256 (Epoch 1)
- **Final Validation Loss**: 0.004294 (Epoch 100)
- **Final Test Loss**: 0.004290
- **Improvement**: 98.4% reduction in loss
## Performance
| Metric | Value |
|--------|-------|
| Test MSE Loss | 0.004290 |
| Training Time | 26.24 minutes |
| GPU Memory | ~40GB peak |
| Throughput | ~3,600 samples/sec |
## Usage
### Loading the Model
```python
import torch
import torch.nn as nn
from huggingface_hub import hf_hub_download
# Define the model architecture
class ResidualBlock(nn.Module):
def __init__(self, channels):
super().__init__()
self.conv1 = nn.Conv2d(channels, channels, 3, padding=1)
self.bn1 = nn.BatchNorm2d(channels)
self.conv2 = nn.Conv2d(channels, channels, 3, padding=1)
self.bn2 = nn.BatchNorm2d(channels)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
residual = x
out = self.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
out += residual
return self.relu(out)
class ResidualConvAutoencoder(nn.Module):
def __init__(self, latent_dim=512):
super().__init__()
# Encoder
self.encoder = nn.Sequential(
nn.Conv2d(3, 64, 4, stride=2, padding=1), # 128->64
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
ResidualBlock(64),
nn.Conv2d(64, 128, 4, stride=2, padding=1), # 64->32
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
ResidualBlock(128),
nn.Conv2d(128, 256, 4, stride=2, padding=1), # 32->16
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
ResidualBlock(256),
nn.Conv2d(256, 512, 4, stride=2, padding=1), # 16->8
nn.BatchNorm2d(512),
nn.ReLU(inplace=True),
ResidualBlock(512),
nn.Conv2d(512, 512, 4, stride=2, padding=1), # 8->4
nn.BatchNorm2d(512),
nn.ReLU(inplace=True),
)
self.fc_encoder = nn.Linear(512 * 4 * 4, latent_dim)
self.fc_decoder = nn.Linear(latent_dim, 512 * 4 * 4)
# Decoder
self.decoder = nn.Sequential(
nn.ConvTranspose2d(512, 512, 4, stride=2, padding=1), # 4->8
nn.BatchNorm2d(512),
nn.ReLU(inplace=True),
ResidualBlock(512),
nn.ConvTranspose2d(512, 256, 4, stride=2, padding=1), # 8->16
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
ResidualBlock(256),
nn.ConvTranspose2d(256, 128, 4, stride=2, padding=1), # 16->32
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
ResidualBlock(128),
nn.ConvTranspose2d(128, 64, 4, stride=2, padding=1), # 32->64
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
ResidualBlock(64),
nn.ConvTranspose2d(64, 3, 4, stride=2, padding=1), # 64->128
nn.Tanh()
)
def forward(self, x):
x = self.encoder(x)
x = x.view(x.size(0), -1)
latent = self.fc_encoder(x)
x = self.fc_decoder(latent)
x = x.view(x.size(0), 512, 4, 4)
reconstructed = self.decoder(x)
return reconstructed, latent
# Download and load the model
checkpoint_path = hf_hub_download(
repo_id="ash12321/deepfake-autoencoder-cifar10-v2",
filename="model_best_checkpoint.ckpt"
)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = ResidualConvAutoencoder(latent_dim=512).to(device)
checkpoint = torch.load(checkpoint_path, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
print("Model loaded successfully!")
```
### Inference Example
```python
from torchvision import transforms
from PIL import Image
# Prepare image
transform = transforms.Compose([
transforms.Resize((128, 128)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])
image = Image.open("your_image.jpg").convert('RGB')
input_tensor = transform(image).unsqueeze(0).to(device)
# Get reconstruction
with torch.no_grad():
reconstructed, latent = model(input_tensor)
# Denormalize for visualization
reconstructed = (reconstructed * 0.5) + 0.5
```
## Reconstruction Examples
![Reconstruction Comparison](reconstruction_comparison.png)
The image above shows 10 original CIFAR-10 test images (top row) and their reconstructions (bottom row), demonstrating the model's excellent reconstruction quality.
## Applications
- **Deepfake Detection**: Use reconstruction error as a signal for detecting manipulated images
- **Anomaly Detection**: Identify out-of-distribution images based on reconstruction quality
- **Image Compression**: Compress images to 512-dimensional latent vectors
- **Feature Extraction**: Use the encoder as a feature extractor for downstream tasks
- **Image Denoising**: Potential for removing noise through reconstruction
## Limitations
- Trained specifically on CIFAR-10 (32x32 images upscaled to 128x128)
- May not generalize well to real-world high-resolution images without fine-tuning
- Optimized for natural images; performance on synthetic/generated images varies
- Reconstruction quality degrades for images significantly different from CIFAR-10 distribution
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{deepfake-autoencoder-cifar10-v2,
author = {ash12321},
title = {Residual Convolutional Autoencoder for Deepfake Detection},
year = {2024},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}}
}
```
## Model Card Authors
- **ash12321**
## Model Card Contact
For questions or issues, please open an issue in the repository.
---
*Model trained on December 08, 2025*