ash12321
/

deepfake-autoencoder-cifar10-v2

+---
+license: mit
+tags:
+- pytorch
+- autoencoder
+- deepfake-detection
+- cifar10
+- computer-vision
+- image-reconstruction
+datasets:
+- cifar10
+metrics:
+- mse
+library_name: pytorch
+---
+# Residual Convolutional Autoencoder for Deepfake Detection
+## Model Description
+This is a **5-stage Residual Convolutional Autoencoder** trained on CIFAR-10 for high-quality image reconstruction. The model achieves exceptional reconstruction quality and can be used as a foundation for deepfake detection systems.
+### Architecture
+- **Encoder**: 5 downsampling stages (128→64→32→16→8→4) with residual blocks
+- **Latent Dimension**: 512
+- **Decoder**: 5 upsampling stages with residual blocks
+- **Total Parameters**: 34,849,667
+- **Input Size**: 128x128x3 (RGB images)
+- **Output Range**: [-1, 1] (Tanh activation)
+## Training Details
+### Training Data
+- **Dataset**: CIFAR-10 (50,000 training images, 10,000 test images)
+- **Image Size**: Resized to 128x128
+- **Normalization**: Mean=0.5, Std=0.5 (range [-1, 1])
+### Training Configuration
+- **GPU**: NVIDIA H100 80GB HBM3
+- **Batch Size**: 1024
+- **Optimizer**: AdamW (lr=1e-3, weight_decay=1e-5)
+- **Loss Function**: MSE (Mean Squared Error)
+- **Scheduler**: ReduceLROnPlateau (factor=0.5, patience=5)
+- **Epochs**: 100
+- **Training Time**: ~26 minutes
+### Training Results
+- **Initial Validation Loss**: 0.266256 (Epoch 1)
+- **Final Validation Loss**: 0.004294 (Epoch 100)
+- **Final Test Loss**: 0.004290
+- **Improvement**: 98.4% reduction in loss
+## Performance
+| Metric | Value |
+|--------|-------|
+| Test MSE Loss | 0.004290 |
+| Training Time | 26.24 minutes |
+| GPU Memory | ~40GB peak |
+| Throughput | ~3,600 samples/sec |
+## Usage
+### Loading the Model
+```python
+import torch
+import torch.nn as nn
+from huggingface_hub import hf_hub_download
+# Define the model architecture
+class ResidualBlock(nn.Module):
+    def __init__(self, channels):
+        super().__init__()
+        self.conv1 = nn.Conv2d(channels, channels, 3, padding=1)
+        self.bn1 = nn.BatchNorm2d(channels)
+        self.conv2 = nn.Conv2d(channels, channels, 3, padding=1)
+        self.bn2 = nn.BatchNorm2d(channels)
+        self.relu = nn.ReLU(inplace=True)
+    def forward(self, x):
+        residual = x
+        out = self.relu(self.bn1(self.conv1(x)))
+        out = self.bn2(self.conv2(out))
+        out += residual
+        return self.relu(out)
+class ResidualConvAutoencoder(nn.Module):
+    def __init__(self, latent_dim=512):
+        super().__init__()
+        # Encoder
+        self.encoder = nn.Sequential(
+            nn.Conv2d(3, 64, 4, stride=2, padding=1),  # 128->64
+            nn.BatchNorm2d(64),
+            nn.ReLU(inplace=True),
+            ResidualBlock(64),
+            nn.Conv2d(64, 128, 4, stride=2, padding=1),  # 64->32
+            nn.BatchNorm2d(128),
+            nn.ReLU(inplace=True),
+            ResidualBlock(128),
+            nn.Conv2d(128, 256, 4, stride=2, padding=1),  # 32->16
+            nn.BatchNorm2d(256),
+            nn.ReLU(inplace=True),
+            ResidualBlock(256),
+            nn.Conv2d(256, 512, 4, stride=2, padding=1),  # 16->8
+            nn.BatchNorm2d(512),
+            nn.ReLU(inplace=True),
+            ResidualBlock(512),
+            nn.Conv2d(512, 512, 4, stride=2, padding=1),  # 8->4
+            nn.BatchNorm2d(512),
+            nn.ReLU(inplace=True),
+        )
+        self.fc_encoder = nn.Linear(512 * 4 * 4, latent_dim)
+        self.fc_decoder = nn.Linear(latent_dim, 512 * 4 * 4)
+        # Decoder
+        self.decoder = nn.Sequential(
+            nn.ConvTranspose2d(512, 512, 4, stride=2, padding=1),  # 4->8
+            nn.BatchNorm2d(512),
+            nn.ReLU(inplace=True),
+            ResidualBlock(512),
+            nn.ConvTranspose2d(512, 256, 4, stride=2, padding=1),  # 8->16
+            nn.BatchNorm2d(256),
+            nn.ReLU(inplace=True),
+            ResidualBlock(256),
+            nn.ConvTranspose2d(256, 128, 4, stride=2, padding=1),  # 16->32
+            nn.BatchNorm2d(128),
+            nn.ReLU(inplace=True),
+            ResidualBlock(128),
+            nn.ConvTranspose2d(128, 64, 4, stride=2, padding=1),  # 32->64
+            nn.BatchNorm2d(64),
+            nn.ReLU(inplace=True),
+            ResidualBlock(64),
+            nn.ConvTranspose2d(64, 3, 4, stride=2, padding=1),  # 64->128
+            nn.Tanh()
+        )
+    def forward(self, x):
+        x = self.encoder(x)
+        x = x.view(x.size(0), -1)
+        latent = self.fc_encoder(x)
+        x = self.fc_decoder(latent)
+        x = x.view(x.size(0), 512, 4, 4)
+        reconstructed = self.decoder(x)
+        return reconstructed, latent
+# Download and load the model
+checkpoint_path = hf_hub_download(
+    repo_id="ash12321/deepfake-autoencoder-cifar10-v2",
+    filename="model_best_checkpoint.ckpt"
+)
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+model = ResidualConvAutoencoder(latent_dim=512).to(device)
+checkpoint = torch.load(checkpoint_path, map_location=device)
+model.load_state_dict(checkpoint['model_state_dict'])
+model.eval()
+print("Model loaded successfully!")
+```
+### Inference Example
+```python
+from torchvision import transforms
+from PIL import Image
+# Prepare image
+transform = transforms.Compose([
+    transforms.Resize((128, 128)),
+    transforms.ToTensor(),
+    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
+])
+image = Image.open("your_image.jpg").convert('RGB')
+input_tensor = transform(image).unsqueeze(0).to(device)
+# Get reconstruction
+with torch.no_grad():
+    reconstructed, latent = model(input_tensor)
+# Denormalize for visualization
+reconstructed = (reconstructed * 0.5) + 0.5
+```
+## Reconstruction Examples
+![Reconstruction Comparison](reconstruction_comparison.png)
+The image above shows 10 original CIFAR-10 test images (top row) and their reconstructions (bottom row), demonstrating the model's excellent reconstruction quality.
+## Applications
+- **Deepfake Detection**: Use reconstruction error as a signal for detecting manipulated images
+- **Anomaly Detection**: Identify out-of-distribution images based on reconstruction quality
+- **Image Compression**: Compress images to 512-dimensional latent vectors
+- **Feature Extraction**: Use the encoder as a feature extractor for downstream tasks
+- **Image Denoising**: Potential for removing noise through reconstruction
+## Limitations
+- Trained specifically on CIFAR-10 (32x32 images upscaled to 128x128)
+- May not generalize well to real-world high-resolution images without fine-tuning
+- Optimized for natural images; performance on synthetic/generated images varies
+- Reconstruction quality degrades for images significantly different from CIFAR-10 distribution
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{deepfake-autoencoder-cifar10-v2,
+  author = {ash12321},
+  title = {Residual Convolutional Autoencoder for Deepfake Detection},
+  year = {2024},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}}
+}
+```
+## Model Card Authors
+- **ash12321**
+## Model Card Contact
+For questions or issues, please open an issue in the repository.
+---
+*Model trained on December 08, 2025*