| # Residual Convolutional Autoencoder Ensemble | |
| Deep learning models for image reconstruction using residual convolutional autoencoders. | |
| ## Model Architecture | |
| Two variants of a deep convolutional autoencoder with residual blocks: | |
| - **Model A**: latent_dim=512, dropout=0.15 | |
| - **Model B**: latent_dim=768, dropout=0.20 | |
| ### Architecture Details | |
| ``` | |
| Input: (B, 3, 256, 256) RGB images in range [-1, 1] | |
| Encoder: 6-layer CNN with residual blocks (256β128β64β32β16β8β4) | |
| Latent: Fully connected projection to latent_dim | |
| Decoder: 6-layer TransposeCNN with residual blocks (4β8β16β32β64β128β256) | |
| Output: (B, 3, 256, 256) Reconstructed images + (B, latent_dim) latent codes | |
| ``` | |
| ## Training Details | |
| - **Dataset**: Real images (256x256 resolution) | |
| - **Loss**: MSE (Mean Squared Error) | |
| - **Optimizer**: AdamW with weight decay | |
| - **Training**: 100+ epochs with validation monitoring | |
| - **Best Validation Loss**: | |
| - Model A: 0.025486 | |
| - Model B: 0.025033 | |
| ## Usage | |
| ```python | |
| import torch | |
| from model import ResidualConvAutoencoder, load_model | |
| # Option 1: Load pre-trained model | |
| model, checkpoint = load_model('model_a_best.pth', latent_dim=512, dropout=0.15) | |
| # Option 2: Create from scratch | |
| model = ResidualConvAutoencoder(latent_dim=512, dropout=0.15) | |
| model.eval() | |
| # Prepare image (normalize to [-1, 1]) | |
| from torchvision import transforms | |
| transform = transforms.Compose([ | |
| transforms.Resize((256, 256)), | |
| transforms.ToTensor(), | |
| transforms.Lambda(lambda x: x * 2 - 1) # [0,1] -> [-1,1] | |
| ]) | |
| # Inference | |
| with torch.no_grad(): | |
| img_tensor = transform(image).unsqueeze(0) | |
| reconstructed, latent = model(img_tensor) | |
| # Get reconstruction error | |
| error = torch.nn.functional.mse_loss(reconstructed, img_tensor) | |
| ``` | |
| ## Model Files | |
| - `model_a_best.pth` - Model A checkpoint (latent_dim=512) | |
| - `model_b_best.pth` - Model B checkpoint (latent_dim=768) | |
| - `model.py` - Model architecture definition | |
| - `config.json` - Training configuration | |
| - `training_history.json` - Full training metrics | |
| ## Research Findings | |
| **Important Note**: These models were trained as image reconstruction autoencoders. Testing revealed they function as **enhancement/denoising models** rather than anomaly detectors: | |
| - β Successfully reconstructs natural images | |
| - β Can denoise corrupted images (JPEG artifacts, blur, contrast) | |
| - β οΈ Not suitable for detecting modern AI-generated images | |
| - β οΈ Shows negative discrimination for degraded images (reconstructs them better) | |
| ### Performance on Synthetic Corruptions | |
| | Corruption Type | Separation from Real | | |
| |----------------|---------------------| | |
| | Noise Added | +122.1% β | | |
| | Color Shifted | +23.8% β οΈ | | |
| | Patch Corrupted | +12.6% β | | |
| | JPEG Compressed | -9.8% β | | |
| | Contrast Altered | -90.1% β | | |
| | Blurred | -92.5% β | | |
| Negative percentages indicate the model reconstructs corrupted images *better* than real images (denoising effect). | |
| ## Limitations | |
| 1. **Not an anomaly detector**: Models enhance/denoise rather than faithfully reconstruct | |
| 2. **Poor for fake detection**: Cannot reliably distinguish modern AI-generated images from real ones | |
| 3. **Pixel-space limitations**: Modern AI images are statistically similar to real images in pixel space | |
| ## Recommended Use Cases | |
| β Image denoising and enhancement | |
| β Feature extraction (latent representations) | |
| β Image compression/reconstruction | |
| β Transfer learning backbone | |
| β Fake image detection (use supervised classifiers instead) | |
| β Anomaly detection (use different approach) | |
| ## Citation | |
| If you use these models in your research, please cite: | |
| ``` | |
| @model{residual_autoencoder_ensemble_2024, | |
| author = {ash12321}, | |
| title = {Residual Convolutional Autoencoder Ensemble}, | |
| year = {2024}, | |
| publisher = {Hugging Face}, | |
| howpublished = {\url{https://huggingface.co/ash12321/residual-autoencoder-ensemble}} | |
| } | |
| ``` | |
| ## License | |
| MIT License - See LICENSE file for details | |
| ## Contact | |
| For questions or issues, please open an issue on the Hugging Face model page. | |