deepfake-autoencoder-cifar10-v2 / README.md

Upload README.md with huggingface_hub

716ac9d verified 4 days ago

7.35 kB

	---
	license: mit
	tags:
	- pytorch
	- autoencoder
	- deepfake-detection
	- cifar10
	- computer-vision
	- image-reconstruction
	datasets:
	- cifar10
	metrics:
	- mse
	library_name: pytorch
	---

	# Residual Convolutional Autoencoder for Deepfake Detection

	## Model Description

	This is a 5-stage Residual Convolutional Autoencoder trained on CIFAR-10 for high-quality image reconstruction. The model achieves exceptional reconstruction quality and can be used as a foundation for deepfake detection systems.

	### Architecture

	- Encoder: 5 downsampling stages (128→64→32→16→8→4) with residual blocks
	- Latent Dimension: 512
	- Decoder: 5 upsampling stages with residual blocks
	- Total Parameters: 34,849,667
	- Input Size: 128x128x3 (RGB images)
	- Output Range: [-1, 1] (Tanh activation)

	## Training Details

	### Training Data
	- Dataset: CIFAR-10 (50,000 training images, 10,000 test images)
	- Image Size: Resized to 128x128
	- Normalization: Mean=0.5, Std=0.5 (range [-1, 1])

	### Training Configuration
	- GPU: NVIDIA H100 80GB HBM3
	- Batch Size: 1024
	- Optimizer: AdamW (lr=1e-3, weight_decay=1e-5)
	- Loss Function: MSE (Mean Squared Error)
	- Scheduler: ReduceLROnPlateau (factor=0.5, patience=5)
	- Epochs: 100
	- Training Time: ~26 minutes

	### Training Results
	- Initial Validation Loss: 0.266256 (Epoch 1)
	- Final Validation Loss: 0.004294 (Epoch 100)
	- Final Test Loss: 0.004290
	- Improvement: 98.4% reduction in loss

	## Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Test MSE Loss \| 0.004290 \|
	\| Training Time \| 26.24 minutes \|
	\| GPU Memory \| ~40GB peak \|
	\| Throughput \| ~3,600 samples/sec \|

	## Usage

	### Loading the Model

	```python
	import torch
	import torch.nn as nn
	from huggingface_hub import hf_hub_download

	# Define the model architecture
	class ResidualBlock(nn.Module):
	def __init__(self, channels):
	super().__init__()
	self.conv1 = nn.Conv2d(channels, channels, 3, padding=1)
	self.bn1 = nn.BatchNorm2d(channels)
	self.conv2 = nn.Conv2d(channels, channels, 3, padding=1)
	self.bn2 = nn.BatchNorm2d(channels)
	self.relu = nn.ReLU(inplace=True)

	def forward(self, x):
	residual = x
	out = self.relu(self.bn1(self.conv1(x)))
	out = self.bn2(self.conv2(out))
	out += residual
	return self.relu(out)

	class ResidualConvAutoencoder(nn.Module):
	def __init__(self, latent_dim=512):
	super().__init__()

	# Encoder
	self.encoder = nn.Sequential(
	nn.Conv2d(3, 64, 4, stride=2, padding=1), # 128->64
	nn.BatchNorm2d(64),
	nn.ReLU(inplace=True),
	ResidualBlock(64),

	nn.Conv2d(64, 128, 4, stride=2, padding=1), # 64->32
	nn.BatchNorm2d(128),
	nn.ReLU(inplace=True),
	ResidualBlock(128),

	nn.Conv2d(128, 256, 4, stride=2, padding=1), # 32->16
	nn.BatchNorm2d(256),
	nn.ReLU(inplace=True),
	ResidualBlock(256),

	nn.Conv2d(256, 512, 4, stride=2, padding=1), # 16->8
	nn.BatchNorm2d(512),
	nn.ReLU(inplace=True),
	ResidualBlock(512),

	nn.Conv2d(512, 512, 4, stride=2, padding=1), # 8->4
	nn.BatchNorm2d(512),
	nn.ReLU(inplace=True),
	)

	self.fc_encoder = nn.Linear(512 * 4 * 4, latent_dim)
	self.fc_decoder = nn.Linear(latent_dim, 512 * 4 * 4)

	# Decoder
	self.decoder = nn.Sequential(
	nn.ConvTranspose2d(512, 512, 4, stride=2, padding=1), # 4->8
	nn.BatchNorm2d(512),
	nn.ReLU(inplace=True),
	ResidualBlock(512),

	nn.ConvTranspose2d(512, 256, 4, stride=2, padding=1), # 8->16
	nn.BatchNorm2d(256),
	nn.ReLU(inplace=True),
	ResidualBlock(256),

	nn.ConvTranspose2d(256, 128, 4, stride=2, padding=1), # 16->32
	nn.BatchNorm2d(128),
	nn.ReLU(inplace=True),
	ResidualBlock(128),

	nn.ConvTranspose2d(128, 64, 4, stride=2, padding=1), # 32->64
	nn.BatchNorm2d(64),
	nn.ReLU(inplace=True),
	ResidualBlock(64),

	nn.ConvTranspose2d(64, 3, 4, stride=2, padding=1), # 64->128
	nn.Tanh()
	)

	def forward(self, x):
	x = self.encoder(x)
	x = x.view(x.size(0), -1)
	latent = self.fc_encoder(x)
	x = self.fc_decoder(latent)
	x = x.view(x.size(0), 512, 4, 4)
	reconstructed = self.decoder(x)
	return reconstructed, latent

	# Download and load the model
	checkpoint_path = hf_hub_download(
	repo_id="ash12321/deepfake-autoencoder-cifar10-v2",
	filename="model_best_checkpoint.ckpt"
	)

	device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
	model = ResidualConvAutoencoder(latent_dim=512).to(device)

	checkpoint = torch.load(checkpoint_path, map_location=device)
	model.load_state_dict(checkpoint['model_state_dict'])
	model.eval()

	print("Model loaded successfully!")
	```

	### Inference Example

	```python
	from torchvision import transforms
	from PIL import Image

	# Prepare image
	transform = transforms.Compose([
	transforms.Resize((128, 128)),
	transforms.ToTensor(),
	transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
	])

	image = Image.open("your_image.jpg").convert('RGB')
	input_tensor = transform(image).unsqueeze(0).to(device)

	# Get reconstruction
	with torch.no_grad():
	reconstructed, latent = model(input_tensor)

	# Denormalize for visualization
	reconstructed = (reconstructed * 0.5) + 0.5
	```

	## Reconstruction Examples

	![Reconstruction Comparison](reconstruction_comparison.png)

	The image above shows 10 original CIFAR-10 test images (top row) and their reconstructions (bottom row), demonstrating the model's excellent reconstruction quality.

	## Applications

	- Deepfake Detection: Use reconstruction error as a signal for detecting manipulated images
	- Anomaly Detection: Identify out-of-distribution images based on reconstruction quality
	- Image Compression: Compress images to 512-dimensional latent vectors
	- Feature Extraction: Use the encoder as a feature extractor for downstream tasks
	- Image Denoising: Potential for removing noise through reconstruction

	## Limitations

	- Trained specifically on CIFAR-10 (32x32 images upscaled to 128x128)
	- May not generalize well to real-world high-resolution images without fine-tuning
	- Optimized for natural images; performance on synthetic/generated images varies
	- Reconstruction quality degrades for images significantly different from CIFAR-10 distribution

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{deepfake-autoencoder-cifar10-v2,
	author = {ash12321},
	title = {Residual Convolutional Autoencoder for Deepfake Detection},
	year = {2024},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}}
	}
	```

	## Model Card Authors

	- ash12321

	## Model Card Contact

	For questions or issues, please open an issue in the repository.

	---

	Model trained on December 08, 2025