# Residual Convolutional Autoencoder Ensemble Deep learning models for image reconstruction using residual convolutional autoencoders. ## Model Architecture Two variants of a deep convolutional autoencoder with residual blocks: - **Model A**: latent_dim=512, dropout=0.15 - **Model B**: latent_dim=768, dropout=0.20 ### Architecture Details ``` Input: (B, 3, 256, 256) RGB images in range [-1, 1] Encoder: 6-layer CNN with residual blocks (256→128→64→32→16→8→4) Latent: Fully connected projection to latent_dim Decoder: 6-layer TransposeCNN with residual blocks (4→8→16→32→64→128→256) Output: (B, 3, 256, 256) Reconstructed images + (B, latent_dim) latent codes ``` ## Training Details - **Dataset**: Real images (256x256 resolution) - **Loss**: MSE (Mean Squared Error) - **Optimizer**: AdamW with weight decay - **Training**: 100+ epochs with validation monitoring - **Best Validation Loss**: - Model A: 0.025486 - Model B: 0.025033 ## Usage ```python import torch from model import ResidualConvAutoencoder, load_model # Option 1: Load pre-trained model model, checkpoint = load_model('model_a_best.pth', latent_dim=512, dropout=0.15) # Option 2: Create from scratch model = ResidualConvAutoencoder(latent_dim=512, dropout=0.15) model.eval() # Prepare image (normalize to [-1, 1]) from torchvision import transforms transform = transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor(), transforms.Lambda(lambda x: x * 2 - 1) # [0,1] -> [-1,1] ]) # Inference with torch.no_grad(): img_tensor = transform(image).unsqueeze(0) reconstructed, latent = model(img_tensor) # Get reconstruction error error = torch.nn.functional.mse_loss(reconstructed, img_tensor) ``` ## Model Files - `model_a_best.pth` - Model A checkpoint (latent_dim=512) - `model_b_best.pth` - Model B checkpoint (latent_dim=768) - `model.py` - Model architecture definition - `config.json` - Training configuration - `training_history.json` - Full training metrics ## Research Findings **Important Note**: These models were trained as image reconstruction autoencoders. Testing revealed they function as **enhancement/denoising models** rather than anomaly detectors: - ✅ Successfully reconstructs natural images - ✅ Can denoise corrupted images (JPEG artifacts, blur, contrast) - ⚠️ Not suitable for detecting modern AI-generated images - ⚠️ Shows negative discrimination for degraded images (reconstructs them better) ### Performance on Synthetic Corruptions | Corruption Type | Separation from Real | |----------------|---------------------| | Noise Added | +122.1% ✅ | | Color Shifted | +23.8% ⚠️ | | Patch Corrupted | +12.6% ❌ | | JPEG Compressed | -9.8% ❌ | | Contrast Altered | -90.1% ❌ | | Blurred | -92.5% ❌ | Negative percentages indicate the model reconstructs corrupted images *better* than real images (denoising effect). ## Limitations 1. **Not an anomaly detector**: Models enhance/denoise rather than faithfully reconstruct 2. **Poor for fake detection**: Cannot reliably distinguish modern AI-generated images from real ones 3. **Pixel-space limitations**: Modern AI images are statistically similar to real images in pixel space ## Recommended Use Cases ✅ Image denoising and enhancement ✅ Feature extraction (latent representations) ✅ Image compression/reconstruction ✅ Transfer learning backbone ❌ Fake image detection (use supervised classifiers instead) ❌ Anomaly detection (use different approach) ## Citation If you use these models in your research, please cite: ``` @model{residual_autoencoder_ensemble_2024, author = {ash12321}, title = {Residual Convolutional Autoencoder Ensemble}, year = {2024}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/ash12321/residual-autoencoder-ensemble}} } ``` ## License MIT License - See LICENSE file for details ## Contact For questions or issues, please open an issue on the Hugging Face model page.