ash12321 commited on
Commit
8daf709
Β·
verified Β·
1 Parent(s): 5013649

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +127 -0
README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Residual Convolutional Autoencoder Ensemble
2
+
3
+ Deep learning models for image reconstruction using residual convolutional autoencoders.
4
+
5
+ ## Model Architecture
6
+
7
+ Two variants of a deep convolutional autoencoder with residual blocks:
8
+
9
+ - **Model A**: latent_dim=512, dropout=0.15
10
+ - **Model B**: latent_dim=768, dropout=0.20
11
+
12
+ ### Architecture Details
13
+
14
+ ```
15
+ Input: (B, 3, 256, 256) RGB images in range [-1, 1]
16
+ Encoder: 6-layer CNN with residual blocks (256β†’128β†’64β†’32β†’16β†’8β†’4)
17
+ Latent: Fully connected projection to latent_dim
18
+ Decoder: 6-layer TransposeCNN with residual blocks (4β†’8β†’16β†’32β†’64β†’128β†’256)
19
+ Output: (B, 3, 256, 256) Reconstructed images + (B, latent_dim) latent codes
20
+ ```
21
+
22
+ ## Training Details
23
+
24
+ - **Dataset**: Real images (256x256 resolution)
25
+ - **Loss**: MSE (Mean Squared Error)
26
+ - **Optimizer**: AdamW with weight decay
27
+ - **Training**: 100+ epochs with validation monitoring
28
+ - **Best Validation Loss**:
29
+ - Model A: 0.025486
30
+ - Model B: 0.025033
31
+
32
+ ## Usage
33
+
34
+ ```python
35
+ import torch
36
+ from model import ResidualConvAutoencoder, load_model
37
+
38
+ # Option 1: Load pre-trained model
39
+ model, checkpoint = load_model('model_a_best.pth', latent_dim=512, dropout=0.15)
40
+
41
+ # Option 2: Create from scratch
42
+ model = ResidualConvAutoencoder(latent_dim=512, dropout=0.15)
43
+ model.eval()
44
+
45
+ # Prepare image (normalize to [-1, 1])
46
+ from torchvision import transforms
47
+ transform = transforms.Compose([
48
+ transforms.Resize((256, 256)),
49
+ transforms.ToTensor(),
50
+ transforms.Lambda(lambda x: x * 2 - 1) # [0,1] -> [-1,1]
51
+ ])
52
+
53
+ # Inference
54
+ with torch.no_grad():
55
+ img_tensor = transform(image).unsqueeze(0)
56
+ reconstructed, latent = model(img_tensor)
57
+
58
+ # Get reconstruction error
59
+ error = torch.nn.functional.mse_loss(reconstructed, img_tensor)
60
+ ```
61
+
62
+ ## Model Files
63
+
64
+ - `model_a_best.pth` - Model A checkpoint (latent_dim=512)
65
+ - `model_b_best.pth` - Model B checkpoint (latent_dim=768)
66
+ - `model.py` - Model architecture definition
67
+ - `config.json` - Training configuration
68
+ - `training_history.json` - Full training metrics
69
+
70
+ ## Research Findings
71
+
72
+ **Important Note**: These models were trained as image reconstruction autoencoders. Testing revealed they function as **enhancement/denoising models** rather than anomaly detectors:
73
+
74
+ - βœ… Successfully reconstructs natural images
75
+ - βœ… Can denoise corrupted images (JPEG artifacts, blur, contrast)
76
+ - ⚠️ Not suitable for detecting modern AI-generated images
77
+ - ⚠️ Shows negative discrimination for degraded images (reconstructs them better)
78
+
79
+ ### Performance on Synthetic Corruptions
80
+
81
+ | Corruption Type | Separation from Real |
82
+ |----------------|---------------------|
83
+ | Noise Added | +122.1% βœ… |
84
+ | Color Shifted | +23.8% ⚠️ |
85
+ | Patch Corrupted | +12.6% ❌ |
86
+ | JPEG Compressed | -9.8% ❌ |
87
+ | Contrast Altered | -90.1% ❌ |
88
+ | Blurred | -92.5% ❌ |
89
+
90
+ Negative percentages indicate the model reconstructs corrupted images *better* than real images (denoising effect).
91
+
92
+ ## Limitations
93
+
94
+ 1. **Not an anomaly detector**: Models enhance/denoise rather than faithfully reconstruct
95
+ 2. **Poor for fake detection**: Cannot reliably distinguish modern AI-generated images from real ones
96
+ 3. **Pixel-space limitations**: Modern AI images are statistically similar to real images in pixel space
97
+
98
+ ## Recommended Use Cases
99
+
100
+ βœ… Image denoising and enhancement
101
+ βœ… Feature extraction (latent representations)
102
+ βœ… Image compression/reconstruction
103
+ βœ… Transfer learning backbone
104
+ ❌ Fake image detection (use supervised classifiers instead)
105
+ ❌ Anomaly detection (use different approach)
106
+
107
+ ## Citation
108
+
109
+ If you use these models in your research, please cite:
110
+
111
+ ```
112
+ @model{residual_autoencoder_ensemble_2024,
113
+ author = {ash12321},
114
+ title = {Residual Convolutional Autoencoder Ensemble},
115
+ year = {2024},
116
+ publisher = {Hugging Face},
117
+ howpublished = {\url{https://huggingface.co/ash12321/residual-autoencoder-ensemble}}
118
+ }
119
+ ```
120
+
121
+ ## License
122
+
123
+ MIT License - See LICENSE file for details
124
+
125
+ ## Contact
126
+
127
+ For questions or issues, please open an issue on the Hugging Face model page.