YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Residual Convolutional Autoencoder Ensemble
Deep learning models for image reconstruction using residual convolutional autoencoders.
Model Architecture
Two variants of a deep convolutional autoencoder with residual blocks:
- Model A: latent_dim=512, dropout=0.15
- Model B: latent_dim=768, dropout=0.20
Architecture Details
Input: (B, 3, 256, 256) RGB images in range [-1, 1]
Encoder: 6-layer CNN with residual blocks (256β128β64β32β16β8β4)
Latent: Fully connected projection to latent_dim
Decoder: 6-layer TransposeCNN with residual blocks (4β8β16β32β64β128β256)
Output: (B, 3, 256, 256) Reconstructed images + (B, latent_dim) latent codes
Training Details
- Dataset: Real images (256x256 resolution)
- Loss: MSE (Mean Squared Error)
- Optimizer: AdamW with weight decay
- Training: 100+ epochs with validation monitoring
- Best Validation Loss:
- Model A: 0.025486
- Model B: 0.025033
Usage
import torch
from model import ResidualConvAutoencoder, load_model
# Option 1: Load pre-trained model
model, checkpoint = load_model('model_a_best.pth', latent_dim=512, dropout=0.15)
# Option 2: Create from scratch
model = ResidualConvAutoencoder(latent_dim=512, dropout=0.15)
model.eval()
# Prepare image (normalize to [-1, 1])
from torchvision import transforms
transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Lambda(lambda x: x * 2 - 1) # [0,1] -> [-1,1]
])
# Inference
with torch.no_grad():
img_tensor = transform(image).unsqueeze(0)
reconstructed, latent = model(img_tensor)
# Get reconstruction error
error = torch.nn.functional.mse_loss(reconstructed, img_tensor)
Model Files
model_a_best.pth- Model A checkpoint (latent_dim=512)model_b_best.pth- Model B checkpoint (latent_dim=768)model.py- Model architecture definitionconfig.json- Training configurationtraining_history.json- Full training metrics
Research Findings
Important Note: These models were trained as image reconstruction autoencoders. Testing revealed they function as enhancement/denoising models rather than anomaly detectors:
- β Successfully reconstructs natural images
- β Can denoise corrupted images (JPEG artifacts, blur, contrast)
- β οΈ Not suitable for detecting modern AI-generated images
- β οΈ Shows negative discrimination for degraded images (reconstructs them better)
Performance on Synthetic Corruptions
| Corruption Type | Separation from Real |
|---|---|
| Noise Added | +122.1% β |
| Color Shifted | +23.8% β οΈ |
| Patch Corrupted | +12.6% β |
| JPEG Compressed | -9.8% β |
| Contrast Altered | -90.1% β |
| Blurred | -92.5% β |
Negative percentages indicate the model reconstructs corrupted images better than real images (denoising effect).
Limitations
- Not an anomaly detector: Models enhance/denoise rather than faithfully reconstruct
- Poor for fake detection: Cannot reliably distinguish modern AI-generated images from real ones
- Pixel-space limitations: Modern AI images are statistically similar to real images in pixel space
Recommended Use Cases
β
Image denoising and enhancement
β
Feature extraction (latent representations)
β
Image compression/reconstruction
β
Transfer learning backbone
β Fake image detection (use supervised classifiers instead)
β Anomaly detection (use different approach)
Citation
If you use these models in your research, please cite:
@model{residual_autoencoder_ensemble_2024,
author = {ash12321},
title = {Residual Convolutional Autoencoder Ensemble},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ash12321/residual-autoencoder-ensemble}}
}
License
MIT License - See LICENSE file for details
Contact
For questions or issues, please open an issue on the Hugging Face model page.
- Downloads last month
- 2