Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,127 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Residual Convolutional Autoencoder Ensemble
|
| 2 |
+
|
| 3 |
+
Deep learning models for image reconstruction using residual convolutional autoencoders.
|
| 4 |
+
|
| 5 |
+
## Model Architecture
|
| 6 |
+
|
| 7 |
+
Two variants of a deep convolutional autoencoder with residual blocks:
|
| 8 |
+
|
| 9 |
+
- **Model A**: latent_dim=512, dropout=0.15
|
| 10 |
+
- **Model B**: latent_dim=768, dropout=0.20
|
| 11 |
+
|
| 12 |
+
### Architecture Details
|
| 13 |
+
|
| 14 |
+
```
|
| 15 |
+
Input: (B, 3, 256, 256) RGB images in range [-1, 1]
|
| 16 |
+
Encoder: 6-layer CNN with residual blocks (256β128β64β32β16β8β4)
|
| 17 |
+
Latent: Fully connected projection to latent_dim
|
| 18 |
+
Decoder: 6-layer TransposeCNN with residual blocks (4β8β16β32β64β128β256)
|
| 19 |
+
Output: (B, 3, 256, 256) Reconstructed images + (B, latent_dim) latent codes
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
## Training Details
|
| 23 |
+
|
| 24 |
+
- **Dataset**: Real images (256x256 resolution)
|
| 25 |
+
- **Loss**: MSE (Mean Squared Error)
|
| 26 |
+
- **Optimizer**: AdamW with weight decay
|
| 27 |
+
- **Training**: 100+ epochs with validation monitoring
|
| 28 |
+
- **Best Validation Loss**:
|
| 29 |
+
- Model A: 0.025486
|
| 30 |
+
- Model B: 0.025033
|
| 31 |
+
|
| 32 |
+
## Usage
|
| 33 |
+
|
| 34 |
+
```python
|
| 35 |
+
import torch
|
| 36 |
+
from model import ResidualConvAutoencoder, load_model
|
| 37 |
+
|
| 38 |
+
# Option 1: Load pre-trained model
|
| 39 |
+
model, checkpoint = load_model('model_a_best.pth', latent_dim=512, dropout=0.15)
|
| 40 |
+
|
| 41 |
+
# Option 2: Create from scratch
|
| 42 |
+
model = ResidualConvAutoencoder(latent_dim=512, dropout=0.15)
|
| 43 |
+
model.eval()
|
| 44 |
+
|
| 45 |
+
# Prepare image (normalize to [-1, 1])
|
| 46 |
+
from torchvision import transforms
|
| 47 |
+
transform = transforms.Compose([
|
| 48 |
+
transforms.Resize((256, 256)),
|
| 49 |
+
transforms.ToTensor(),
|
| 50 |
+
transforms.Lambda(lambda x: x * 2 - 1) # [0,1] -> [-1,1]
|
| 51 |
+
])
|
| 52 |
+
|
| 53 |
+
# Inference
|
| 54 |
+
with torch.no_grad():
|
| 55 |
+
img_tensor = transform(image).unsqueeze(0)
|
| 56 |
+
reconstructed, latent = model(img_tensor)
|
| 57 |
+
|
| 58 |
+
# Get reconstruction error
|
| 59 |
+
error = torch.nn.functional.mse_loss(reconstructed, img_tensor)
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
## Model Files
|
| 63 |
+
|
| 64 |
+
- `model_a_best.pth` - Model A checkpoint (latent_dim=512)
|
| 65 |
+
- `model_b_best.pth` - Model B checkpoint (latent_dim=768)
|
| 66 |
+
- `model.py` - Model architecture definition
|
| 67 |
+
- `config.json` - Training configuration
|
| 68 |
+
- `training_history.json` - Full training metrics
|
| 69 |
+
|
| 70 |
+
## Research Findings
|
| 71 |
+
|
| 72 |
+
**Important Note**: These models were trained as image reconstruction autoencoders. Testing revealed they function as **enhancement/denoising models** rather than anomaly detectors:
|
| 73 |
+
|
| 74 |
+
- β
Successfully reconstructs natural images
|
| 75 |
+
- β
Can denoise corrupted images (JPEG artifacts, blur, contrast)
|
| 76 |
+
- β οΈ Not suitable for detecting modern AI-generated images
|
| 77 |
+
- β οΈ Shows negative discrimination for degraded images (reconstructs them better)
|
| 78 |
+
|
| 79 |
+
### Performance on Synthetic Corruptions
|
| 80 |
+
|
| 81 |
+
| Corruption Type | Separation from Real |
|
| 82 |
+
|----------------|---------------------|
|
| 83 |
+
| Noise Added | +122.1% β
|
|
| 84 |
+
| Color Shifted | +23.8% β οΈ |
|
| 85 |
+
| Patch Corrupted | +12.6% β |
|
| 86 |
+
| JPEG Compressed | -9.8% β |
|
| 87 |
+
| Contrast Altered | -90.1% β |
|
| 88 |
+
| Blurred | -92.5% β |
|
| 89 |
+
|
| 90 |
+
Negative percentages indicate the model reconstructs corrupted images *better* than real images (denoising effect).
|
| 91 |
+
|
| 92 |
+
## Limitations
|
| 93 |
+
|
| 94 |
+
1. **Not an anomaly detector**: Models enhance/denoise rather than faithfully reconstruct
|
| 95 |
+
2. **Poor for fake detection**: Cannot reliably distinguish modern AI-generated images from real ones
|
| 96 |
+
3. **Pixel-space limitations**: Modern AI images are statistically similar to real images in pixel space
|
| 97 |
+
|
| 98 |
+
## Recommended Use Cases
|
| 99 |
+
|
| 100 |
+
β
Image denoising and enhancement
|
| 101 |
+
β
Feature extraction (latent representations)
|
| 102 |
+
β
Image compression/reconstruction
|
| 103 |
+
β
Transfer learning backbone
|
| 104 |
+
β Fake image detection (use supervised classifiers instead)
|
| 105 |
+
β Anomaly detection (use different approach)
|
| 106 |
+
|
| 107 |
+
## Citation
|
| 108 |
+
|
| 109 |
+
If you use these models in your research, please cite:
|
| 110 |
+
|
| 111 |
+
```
|
| 112 |
+
@model{residual_autoencoder_ensemble_2024,
|
| 113 |
+
author = {ash12321},
|
| 114 |
+
title = {Residual Convolutional Autoencoder Ensemble},
|
| 115 |
+
year = {2024},
|
| 116 |
+
publisher = {Hugging Face},
|
| 117 |
+
howpublished = {\url{https://huggingface.co/ash12321/residual-autoencoder-ensemble}}
|
| 118 |
+
}
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
## License
|
| 122 |
+
|
| 123 |
+
MIT License - See LICENSE file for details
|
| 124 |
+
|
| 125 |
+
## Contact
|
| 126 |
+
|
| 127 |
+
For questions or issues, please open an issue on the Hugging Face model page.
|