--- license: mit tags: - pytorch - autoencoder - deepfake-detection - cifar10 - computer-vision - image-reconstruction datasets: - cifar10 metrics: - mse library_name: pytorch --- # Residual Convolutional Autoencoder for Deepfake Detection ## Model Description This is a **5-stage Residual Convolutional Autoencoder** trained on CIFAR-10 for high-quality image reconstruction. The model achieves exceptional reconstruction quality and can be used as a foundation for deepfake detection systems. ### Architecture - **Encoder**: 5 downsampling stages (128→64→32→16→8→4) with residual blocks - **Latent Dimension**: 512 - **Decoder**: 5 upsampling stages with residual blocks - **Total Parameters**: 34,849,667 - **Input Size**: 128x128x3 (RGB images) - **Output Range**: [-1, 1] (Tanh activation) ## Training Details ### Training Data - **Dataset**: CIFAR-10 (50,000 training images, 10,000 test images) - **Image Size**: Resized to 128x128 - **Normalization**: Mean=0.5, Std=0.5 (range [-1, 1]) ### Training Configuration - **GPU**: NVIDIA H100 80GB HBM3 - **Batch Size**: 1024 - **Optimizer**: AdamW (lr=1e-3, weight_decay=1e-5) - **Loss Function**: MSE (Mean Squared Error) - **Scheduler**: ReduceLROnPlateau (factor=0.5, patience=5) - **Epochs**: 100 - **Training Time**: ~26 minutes ### Training Results - **Initial Validation Loss**: 0.266256 (Epoch 1) - **Final Validation Loss**: 0.004294 (Epoch 100) - **Final Test Loss**: 0.004290 - **Improvement**: 98.4% reduction in loss ## Performance | Metric | Value | |--------|-------| | Test MSE Loss | 0.004290 | | Training Time | 26.24 minutes | | GPU Memory | ~40GB peak | | Throughput | ~3,600 samples/sec | ## Usage ### Loading the Model ```python import torch import torch.nn as nn from huggingface_hub import hf_hub_download # Define the model architecture class ResidualBlock(nn.Module): def __init__(self, channels): super().__init__() self.conv1 = nn.Conv2d(channels, channels, 3, padding=1) self.bn1 = nn.BatchNorm2d(channels) self.conv2 = nn.Conv2d(channels, channels, 3, padding=1) self.bn2 = nn.BatchNorm2d(channels) self.relu = nn.ReLU(inplace=True) def forward(self, x): residual = x out = self.relu(self.bn1(self.conv1(x))) out = self.bn2(self.conv2(out)) out += residual return self.relu(out) class ResidualConvAutoencoder(nn.Module): def __init__(self, latent_dim=512): super().__init__() # Encoder self.encoder = nn.Sequential( nn.Conv2d(3, 64, 4, stride=2, padding=1), # 128->64 nn.BatchNorm2d(64), nn.ReLU(inplace=True), ResidualBlock(64), nn.Conv2d(64, 128, 4, stride=2, padding=1), # 64->32 nn.BatchNorm2d(128), nn.ReLU(inplace=True), ResidualBlock(128), nn.Conv2d(128, 256, 4, stride=2, padding=1), # 32->16 nn.BatchNorm2d(256), nn.ReLU(inplace=True), ResidualBlock(256), nn.Conv2d(256, 512, 4, stride=2, padding=1), # 16->8 nn.BatchNorm2d(512), nn.ReLU(inplace=True), ResidualBlock(512), nn.Conv2d(512, 512, 4, stride=2, padding=1), # 8->4 nn.BatchNorm2d(512), nn.ReLU(inplace=True), ) self.fc_encoder = nn.Linear(512 * 4 * 4, latent_dim) self.fc_decoder = nn.Linear(latent_dim, 512 * 4 * 4) # Decoder self.decoder = nn.Sequential( nn.ConvTranspose2d(512, 512, 4, stride=2, padding=1), # 4->8 nn.BatchNorm2d(512), nn.ReLU(inplace=True), ResidualBlock(512), nn.ConvTranspose2d(512, 256, 4, stride=2, padding=1), # 8->16 nn.BatchNorm2d(256), nn.ReLU(inplace=True), ResidualBlock(256), nn.ConvTranspose2d(256, 128, 4, stride=2, padding=1), # 16->32 nn.BatchNorm2d(128), nn.ReLU(inplace=True), ResidualBlock(128), nn.ConvTranspose2d(128, 64, 4, stride=2, padding=1), # 32->64 nn.BatchNorm2d(64), nn.ReLU(inplace=True), ResidualBlock(64), nn.ConvTranspose2d(64, 3, 4, stride=2, padding=1), # 64->128 nn.Tanh() ) def forward(self, x): x = self.encoder(x) x = x.view(x.size(0), -1) latent = self.fc_encoder(x) x = self.fc_decoder(latent) x = x.view(x.size(0), 512, 4, 4) reconstructed = self.decoder(x) return reconstructed, latent # Download and load the model checkpoint_path = hf_hub_download( repo_id="ash12321/deepfake-autoencoder-cifar10-v2", filename="model_best_checkpoint.ckpt" ) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = ResidualConvAutoencoder(latent_dim=512).to(device) checkpoint = torch.load(checkpoint_path, map_location=device) model.load_state_dict(checkpoint['model_state_dict']) model.eval() print("Model loaded successfully!") ``` ### Inference Example ```python from torchvision import transforms from PIL import Image # Prepare image transform = transforms.Compose([ transforms.Resize((128, 128)), transforms.ToTensor(), transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) ]) image = Image.open("your_image.jpg").convert('RGB') input_tensor = transform(image).unsqueeze(0).to(device) # Get reconstruction with torch.no_grad(): reconstructed, latent = model(input_tensor) # Denormalize for visualization reconstructed = (reconstructed * 0.5) + 0.5 ``` ## Reconstruction Examples ![Reconstruction Comparison](reconstruction_comparison.png) The image above shows 10 original CIFAR-10 test images (top row) and their reconstructions (bottom row), demonstrating the model's excellent reconstruction quality. ## Applications - **Deepfake Detection**: Use reconstruction error as a signal for detecting manipulated images - **Anomaly Detection**: Identify out-of-distribution images based on reconstruction quality - **Image Compression**: Compress images to 512-dimensional latent vectors - **Feature Extraction**: Use the encoder as a feature extractor for downstream tasks - **Image Denoising**: Potential for removing noise through reconstruction ## Limitations - Trained specifically on CIFAR-10 (32x32 images upscaled to 128x128) - May not generalize well to real-world high-resolution images without fine-tuning - Optimized for natural images; performance on synthetic/generated images varies - Reconstruction quality degrades for images significantly different from CIFAR-10 distribution ## Citation If you use this model in your research, please cite: ```bibtex @misc{deepfake-autoencoder-cifar10-v2, author = {ash12321}, title = {Residual Convolutional Autoencoder for Deepfake Detection}, year = {2024}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}} } ``` ## Model Card Authors - **ash12321** ## Model Card Contact For questions or issues, please open an issue in the repository. --- *Model trained on December 08, 2025*