YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Residual Convolutional Autoencoder Ensemble

Deep learning models for image reconstruction using residual convolutional autoencoders.

Model Architecture

Two variants of a deep convolutional autoencoder with residual blocks:

  • Model A: latent_dim=512, dropout=0.15
  • Model B: latent_dim=768, dropout=0.20

Architecture Details

Input: (B, 3, 256, 256) RGB images in range [-1, 1]
Encoder: 6-layer CNN with residual blocks (256β†’128β†’64β†’32β†’16β†’8β†’4)
Latent: Fully connected projection to latent_dim
Decoder: 6-layer TransposeCNN with residual blocks (4β†’8β†’16β†’32β†’64β†’128β†’256)
Output: (B, 3, 256, 256) Reconstructed images + (B, latent_dim) latent codes

Training Details

  • Dataset: Real images (256x256 resolution)
  • Loss: MSE (Mean Squared Error)
  • Optimizer: AdamW with weight decay
  • Training: 100+ epochs with validation monitoring
  • Best Validation Loss:
    • Model A: 0.025486
    • Model B: 0.025033

Usage

import torch
from model import ResidualConvAutoencoder, load_model

# Option 1: Load pre-trained model
model, checkpoint = load_model('model_a_best.pth', latent_dim=512, dropout=0.15)

# Option 2: Create from scratch
model = ResidualConvAutoencoder(latent_dim=512, dropout=0.15)
model.eval()

# Prepare image (normalize to [-1, 1])
from torchvision import transforms
transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x * 2 - 1)  # [0,1] -> [-1,1]
])

# Inference
with torch.no_grad():
    img_tensor = transform(image).unsqueeze(0)
    reconstructed, latent = model(img_tensor)
    
    # Get reconstruction error
    error = torch.nn.functional.mse_loss(reconstructed, img_tensor)

Model Files

  • model_a_best.pth - Model A checkpoint (latent_dim=512)
  • model_b_best.pth - Model B checkpoint (latent_dim=768)
  • model.py - Model architecture definition
  • config.json - Training configuration
  • training_history.json - Full training metrics

Research Findings

Important Note: These models were trained as image reconstruction autoencoders. Testing revealed they function as enhancement/denoising models rather than anomaly detectors:

  • βœ… Successfully reconstructs natural images
  • βœ… Can denoise corrupted images (JPEG artifacts, blur, contrast)
  • ⚠️ Not suitable for detecting modern AI-generated images
  • ⚠️ Shows negative discrimination for degraded images (reconstructs them better)

Performance on Synthetic Corruptions

Corruption Type Separation from Real
Noise Added +122.1% βœ…
Color Shifted +23.8% ⚠️
Patch Corrupted +12.6% ❌
JPEG Compressed -9.8% ❌
Contrast Altered -90.1% ❌
Blurred -92.5% ❌

Negative percentages indicate the model reconstructs corrupted images better than real images (denoising effect).

Limitations

  1. Not an anomaly detector: Models enhance/denoise rather than faithfully reconstruct
  2. Poor for fake detection: Cannot reliably distinguish modern AI-generated images from real ones
  3. Pixel-space limitations: Modern AI images are statistically similar to real images in pixel space

Recommended Use Cases

βœ… Image denoising and enhancement
βœ… Feature extraction (latent representations)
βœ… Image compression/reconstruction
βœ… Transfer learning backbone
❌ Fake image detection (use supervised classifiers instead)
❌ Anomaly detection (use different approach)

Citation

If you use these models in your research, please cite:

@model{residual_autoencoder_ensemble_2024,
  author = {ash12321},
  title = {Residual Convolutional Autoencoder Ensemble},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ash12321/residual-autoencoder-ensemble}}
}

License

MIT License - See LICENSE file for details

Contact

For questions or issues, please open an issue on the Hugging Face model page.

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support