File size: 4,025 Bytes
8daf709
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# Residual Convolutional Autoencoder Ensemble

Deep learning models for image reconstruction using residual convolutional autoencoders.

## Model Architecture

Two variants of a deep convolutional autoencoder with residual blocks:

- **Model A**: latent_dim=512, dropout=0.15
- **Model B**: latent_dim=768, dropout=0.20

### Architecture Details

```
Input: (B, 3, 256, 256) RGB images in range [-1, 1]
Encoder: 6-layer CNN with residual blocks (256β†’128β†’64β†’32β†’16β†’8β†’4)
Latent: Fully connected projection to latent_dim
Decoder: 6-layer TransposeCNN with residual blocks (4β†’8β†’16β†’32β†’64β†’128β†’256)
Output: (B, 3, 256, 256) Reconstructed images + (B, latent_dim) latent codes
```

## Training Details

- **Dataset**: Real images (256x256 resolution)
- **Loss**: MSE (Mean Squared Error)
- **Optimizer**: AdamW with weight decay
- **Training**: 100+ epochs with validation monitoring
- **Best Validation Loss**: 
  - Model A: 0.025486
  - Model B: 0.025033

## Usage

```python
import torch
from model import ResidualConvAutoencoder, load_model

# Option 1: Load pre-trained model
model, checkpoint = load_model('model_a_best.pth', latent_dim=512, dropout=0.15)

# Option 2: Create from scratch
model = ResidualConvAutoencoder(latent_dim=512, dropout=0.15)
model.eval()

# Prepare image (normalize to [-1, 1])
from torchvision import transforms
transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x * 2 - 1)  # [0,1] -> [-1,1]
])

# Inference
with torch.no_grad():
    img_tensor = transform(image).unsqueeze(0)
    reconstructed, latent = model(img_tensor)
    
    # Get reconstruction error
    error = torch.nn.functional.mse_loss(reconstructed, img_tensor)
```

## Model Files

- `model_a_best.pth` - Model A checkpoint (latent_dim=512)
- `model_b_best.pth` - Model B checkpoint (latent_dim=768)
- `model.py` - Model architecture definition
- `config.json` - Training configuration
- `training_history.json` - Full training metrics

## Research Findings

**Important Note**: These models were trained as image reconstruction autoencoders. Testing revealed they function as **enhancement/denoising models** rather than anomaly detectors:

- βœ… Successfully reconstructs natural images
- βœ… Can denoise corrupted images (JPEG artifacts, blur, contrast)
- ⚠️ Not suitable for detecting modern AI-generated images
- ⚠️ Shows negative discrimination for degraded images (reconstructs them better)

### Performance on Synthetic Corruptions

| Corruption Type | Separation from Real |
|----------------|---------------------|
| Noise Added | +122.1% βœ… |
| Color Shifted | +23.8% ⚠️ |
| Patch Corrupted | +12.6% ❌ |
| JPEG Compressed | -9.8% ❌ |
| Contrast Altered | -90.1% ❌ |
| Blurred | -92.5% ❌ |

Negative percentages indicate the model reconstructs corrupted images *better* than real images (denoising effect).

## Limitations

1. **Not an anomaly detector**: Models enhance/denoise rather than faithfully reconstruct
2. **Poor for fake detection**: Cannot reliably distinguish modern AI-generated images from real ones
3. **Pixel-space limitations**: Modern AI images are statistically similar to real images in pixel space

## Recommended Use Cases

βœ… Image denoising and enhancement  
βœ… Feature extraction (latent representations)  
βœ… Image compression/reconstruction  
βœ… Transfer learning backbone  
❌ Fake image detection (use supervised classifiers instead)  
❌ Anomaly detection (use different approach)

## Citation

If you use these models in your research, please cite:

```
@model{residual_autoencoder_ensemble_2024,
  author = {ash12321},
  title = {Residual Convolutional Autoencoder Ensemble},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ash12321/residual-autoencoder-ensemble}}
}
```

## License

MIT License - See LICENSE file for details

## Contact

For questions or issues, please open an issue on the Hugging Face model page.