Residual Convolutional Autoencoder Ensemble

Deep learning models for image reconstruction using residual convolutional autoencoders.

Model Architecture

Two variants of a deep convolutional autoencoder with residual blocks:

Model A: latent_dim=512, dropout=0.15
Model B: latent_dim=768, dropout=0.20

Architecture Details

Input: (B, 3, 256, 256) RGB images in range [-1, 1]
Encoder: 6-layer CNN with residual blocks (256→128→64→32→16→8→4)
Latent: Fully connected projection to latent_dim
Decoder: 6-layer TransposeCNN with residual blocks (4→8→16→32→64→128→256)
Output: (B, 3, 256, 256) Reconstructed images + (B, latent_dim) latent codes

Training Details

Dataset: Real images (256x256 resolution)
Loss: MSE (Mean Squared Error)
Optimizer: AdamW with weight decay
Training: 100+ epochs with validation monitoring
Best Validation Loss:
- Model A: 0.025486
- Model B: 0.025033

Usage

import torch
from model import ResidualConvAutoencoder, load_model

# Option 1: Load pre-trained model
model, checkpoint = load_model('model_a_best.pth', latent_dim=512, dropout=0.15)

# Option 2: Create from scratch
model = ResidualConvAutoencoder(latent_dim=512, dropout=0.15)
model.eval()

# Prepare image (normalize to [-1, 1])
from torchvision import transforms
transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x * 2 - 1)  # [0,1] -> [-1,1]
])

# Inference
with torch.no_grad():
    img_tensor = transform(image).unsqueeze(0)
    reconstructed, latent = model(img_tensor)
    
    # Get reconstruction error
    error = torch.nn.functional.mse_loss(reconstructed, img_tensor)

Model Files

model_a_best.pth - Model A checkpoint (latent_dim=512)
model_b_best.pth - Model B checkpoint (latent_dim=768)
model.py - Model architecture definition
config.json - Training configuration
training_history.json - Full training metrics

Research Findings

Important Note: These models were trained as image reconstruction autoencoders. Testing revealed they function as enhancement/denoising models rather than anomaly detectors:

✅ Successfully reconstructs natural images
✅ Can denoise corrupted images (JPEG artifacts, blur, contrast)
⚠️ Not suitable for detecting modern AI-generated images
⚠️ Shows negative discrimination for degraded images (reconstructs them better)

Performance on Synthetic Corruptions

Corruption Type	Separation from Real
Noise Added	+122.1% ✅
Color Shifted	+23.8% ⚠️
Patch Corrupted	+12.6% ❌
JPEG Compressed	-9.8% ❌
Contrast Altered	-90.1% ❌
Blurred	-92.5% ❌

Negative percentages indicate the model reconstructs corrupted images better than real images (denoising effect).

Limitations

Not an anomaly detector: Models enhance/denoise rather than faithfully reconstruct
Poor for fake detection: Cannot reliably distinguish modern AI-generated images from real ones
Pixel-space limitations: Modern AI images are statistically similar to real images in pixel space

Recommended Use Cases

✅ Image denoising and enhancement
✅ Feature extraction (latent representations)
✅ Image compression/reconstruction
✅ Transfer learning backbone
❌ Fake image detection (use supervised classifiers instead)
❌ Anomaly detection (use different approach)

Citation

If you use these models in your research, please cite:

@model{residual_autoencoder_ensemble_2024,
  author = {ash12321},
  title = {Residual Convolutional Autoencoder Ensemble},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ash12321/residual-autoencoder-ensemble}}
}

License

MIT License - See LICENSE file for details

Contact

For questions or issues, please open an issue on the Hugging Face model page.

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support