File size: 4,025 Bytes
8daf709 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
# Residual Convolutional Autoencoder Ensemble
Deep learning models for image reconstruction using residual convolutional autoencoders.
## Model Architecture
Two variants of a deep convolutional autoencoder with residual blocks:
- **Model A**: latent_dim=512, dropout=0.15
- **Model B**: latent_dim=768, dropout=0.20
### Architecture Details
```
Input: (B, 3, 256, 256) RGB images in range [-1, 1]
Encoder: 6-layer CNN with residual blocks (256β128β64β32β16β8β4)
Latent: Fully connected projection to latent_dim
Decoder: 6-layer TransposeCNN with residual blocks (4β8β16β32β64β128β256)
Output: (B, 3, 256, 256) Reconstructed images + (B, latent_dim) latent codes
```
## Training Details
- **Dataset**: Real images (256x256 resolution)
- **Loss**: MSE (Mean Squared Error)
- **Optimizer**: AdamW with weight decay
- **Training**: 100+ epochs with validation monitoring
- **Best Validation Loss**:
- Model A: 0.025486
- Model B: 0.025033
## Usage
```python
import torch
from model import ResidualConvAutoencoder, load_model
# Option 1: Load pre-trained model
model, checkpoint = load_model('model_a_best.pth', latent_dim=512, dropout=0.15)
# Option 2: Create from scratch
model = ResidualConvAutoencoder(latent_dim=512, dropout=0.15)
model.eval()
# Prepare image (normalize to [-1, 1])
from torchvision import transforms
transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Lambda(lambda x: x * 2 - 1) # [0,1] -> [-1,1]
])
# Inference
with torch.no_grad():
img_tensor = transform(image).unsqueeze(0)
reconstructed, latent = model(img_tensor)
# Get reconstruction error
error = torch.nn.functional.mse_loss(reconstructed, img_tensor)
```
## Model Files
- `model_a_best.pth` - Model A checkpoint (latent_dim=512)
- `model_b_best.pth` - Model B checkpoint (latent_dim=768)
- `model.py` - Model architecture definition
- `config.json` - Training configuration
- `training_history.json` - Full training metrics
## Research Findings
**Important Note**: These models were trained as image reconstruction autoencoders. Testing revealed they function as **enhancement/denoising models** rather than anomaly detectors:
- β
Successfully reconstructs natural images
- β
Can denoise corrupted images (JPEG artifacts, blur, contrast)
- β οΈ Not suitable for detecting modern AI-generated images
- β οΈ Shows negative discrimination for degraded images (reconstructs them better)
### Performance on Synthetic Corruptions
| Corruption Type | Separation from Real |
|----------------|---------------------|
| Noise Added | +122.1% β
|
| Color Shifted | +23.8% β οΈ |
| Patch Corrupted | +12.6% β |
| JPEG Compressed | -9.8% β |
| Contrast Altered | -90.1% β |
| Blurred | -92.5% β |
Negative percentages indicate the model reconstructs corrupted images *better* than real images (denoising effect).
## Limitations
1. **Not an anomaly detector**: Models enhance/denoise rather than faithfully reconstruct
2. **Poor for fake detection**: Cannot reliably distinguish modern AI-generated images from real ones
3. **Pixel-space limitations**: Modern AI images are statistically similar to real images in pixel space
## Recommended Use Cases
β
Image denoising and enhancement
β
Feature extraction (latent representations)
β
Image compression/reconstruction
β
Transfer learning backbone
β Fake image detection (use supervised classifiers instead)
β Anomaly detection (use different approach)
## Citation
If you use these models in your research, please cite:
```
@model{residual_autoencoder_ensemble_2024,
author = {ash12321},
title = {Residual Convolutional Autoencoder Ensemble},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ash12321/residual-autoencoder-ensemble}}
}
```
## License
MIT License - See LICENSE file for details
## Contact
For questions or issues, please open an issue on the Hugging Face model page.
|