ash12321 commited on
Commit
716ac9d
·
verified ·
1 Parent(s): 35193fc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +243 -0
README.md ADDED
@@ -0,0 +1,243 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - pytorch
5
+ - autoencoder
6
+ - deepfake-detection
7
+ - cifar10
8
+ - computer-vision
9
+ - image-reconstruction
10
+ datasets:
11
+ - cifar10
12
+ metrics:
13
+ - mse
14
+ library_name: pytorch
15
+ ---
16
+
17
+ # Residual Convolutional Autoencoder for Deepfake Detection
18
+
19
+ ## Model Description
20
+
21
+ This is a **5-stage Residual Convolutional Autoencoder** trained on CIFAR-10 for high-quality image reconstruction. The model achieves exceptional reconstruction quality and can be used as a foundation for deepfake detection systems.
22
+
23
+ ### Architecture
24
+
25
+ - **Encoder**: 5 downsampling stages (128→64→32→16→8→4) with residual blocks
26
+ - **Latent Dimension**: 512
27
+ - **Decoder**: 5 upsampling stages with residual blocks
28
+ - **Total Parameters**: 34,849,667
29
+ - **Input Size**: 128x128x3 (RGB images)
30
+ - **Output Range**: [-1, 1] (Tanh activation)
31
+
32
+ ## Training Details
33
+
34
+ ### Training Data
35
+ - **Dataset**: CIFAR-10 (50,000 training images, 10,000 test images)
36
+ - **Image Size**: Resized to 128x128
37
+ - **Normalization**: Mean=0.5, Std=0.5 (range [-1, 1])
38
+
39
+ ### Training Configuration
40
+ - **GPU**: NVIDIA H100 80GB HBM3
41
+ - **Batch Size**: 1024
42
+ - **Optimizer**: AdamW (lr=1e-3, weight_decay=1e-5)
43
+ - **Loss Function**: MSE (Mean Squared Error)
44
+ - **Scheduler**: ReduceLROnPlateau (factor=0.5, patience=5)
45
+ - **Epochs**: 100
46
+ - **Training Time**: ~26 minutes
47
+
48
+ ### Training Results
49
+ - **Initial Validation Loss**: 0.266256 (Epoch 1)
50
+ - **Final Validation Loss**: 0.004294 (Epoch 100)
51
+ - **Final Test Loss**: 0.004290
52
+ - **Improvement**: 98.4% reduction in loss
53
+
54
+ ## Performance
55
+
56
+ | Metric | Value |
57
+ |--------|-------|
58
+ | Test MSE Loss | 0.004290 |
59
+ | Training Time | 26.24 minutes |
60
+ | GPU Memory | ~40GB peak |
61
+ | Throughput | ~3,600 samples/sec |
62
+
63
+ ## Usage
64
+
65
+ ### Loading the Model
66
+
67
+ ```python
68
+ import torch
69
+ import torch.nn as nn
70
+ from huggingface_hub import hf_hub_download
71
+
72
+ # Define the model architecture
73
+ class ResidualBlock(nn.Module):
74
+ def __init__(self, channels):
75
+ super().__init__()
76
+ self.conv1 = nn.Conv2d(channels, channels, 3, padding=1)
77
+ self.bn1 = nn.BatchNorm2d(channels)
78
+ self.conv2 = nn.Conv2d(channels, channels, 3, padding=1)
79
+ self.bn2 = nn.BatchNorm2d(channels)
80
+ self.relu = nn.ReLU(inplace=True)
81
+
82
+ def forward(self, x):
83
+ residual = x
84
+ out = self.relu(self.bn1(self.conv1(x)))
85
+ out = self.bn2(self.conv2(out))
86
+ out += residual
87
+ return self.relu(out)
88
+
89
+ class ResidualConvAutoencoder(nn.Module):
90
+ def __init__(self, latent_dim=512):
91
+ super().__init__()
92
+
93
+ # Encoder
94
+ self.encoder = nn.Sequential(
95
+ nn.Conv2d(3, 64, 4, stride=2, padding=1), # 128->64
96
+ nn.BatchNorm2d(64),
97
+ nn.ReLU(inplace=True),
98
+ ResidualBlock(64),
99
+
100
+ nn.Conv2d(64, 128, 4, stride=2, padding=1), # 64->32
101
+ nn.BatchNorm2d(128),
102
+ nn.ReLU(inplace=True),
103
+ ResidualBlock(128),
104
+
105
+ nn.Conv2d(128, 256, 4, stride=2, padding=1), # 32->16
106
+ nn.BatchNorm2d(256),
107
+ nn.ReLU(inplace=True),
108
+ ResidualBlock(256),
109
+
110
+ nn.Conv2d(256, 512, 4, stride=2, padding=1), # 16->8
111
+ nn.BatchNorm2d(512),
112
+ nn.ReLU(inplace=True),
113
+ ResidualBlock(512),
114
+
115
+ nn.Conv2d(512, 512, 4, stride=2, padding=1), # 8->4
116
+ nn.BatchNorm2d(512),
117
+ nn.ReLU(inplace=True),
118
+ )
119
+
120
+ self.fc_encoder = nn.Linear(512 * 4 * 4, latent_dim)
121
+ self.fc_decoder = nn.Linear(latent_dim, 512 * 4 * 4)
122
+
123
+ # Decoder
124
+ self.decoder = nn.Sequential(
125
+ nn.ConvTranspose2d(512, 512, 4, stride=2, padding=1), # 4->8
126
+ nn.BatchNorm2d(512),
127
+ nn.ReLU(inplace=True),
128
+ ResidualBlock(512),
129
+
130
+ nn.ConvTranspose2d(512, 256, 4, stride=2, padding=1), # 8->16
131
+ nn.BatchNorm2d(256),
132
+ nn.ReLU(inplace=True),
133
+ ResidualBlock(256),
134
+
135
+ nn.ConvTranspose2d(256, 128, 4, stride=2, padding=1), # 16->32
136
+ nn.BatchNorm2d(128),
137
+ nn.ReLU(inplace=True),
138
+ ResidualBlock(128),
139
+
140
+ nn.ConvTranspose2d(128, 64, 4, stride=2, padding=1), # 32->64
141
+ nn.BatchNorm2d(64),
142
+ nn.ReLU(inplace=True),
143
+ ResidualBlock(64),
144
+
145
+ nn.ConvTranspose2d(64, 3, 4, stride=2, padding=1), # 64->128
146
+ nn.Tanh()
147
+ )
148
+
149
+ def forward(self, x):
150
+ x = self.encoder(x)
151
+ x = x.view(x.size(0), -1)
152
+ latent = self.fc_encoder(x)
153
+ x = self.fc_decoder(latent)
154
+ x = x.view(x.size(0), 512, 4, 4)
155
+ reconstructed = self.decoder(x)
156
+ return reconstructed, latent
157
+
158
+ # Download and load the model
159
+ checkpoint_path = hf_hub_download(
160
+ repo_id="ash12321/deepfake-autoencoder-cifar10-v2",
161
+ filename="model_best_checkpoint.ckpt"
162
+ )
163
+
164
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
165
+ model = ResidualConvAutoencoder(latent_dim=512).to(device)
166
+
167
+ checkpoint = torch.load(checkpoint_path, map_location=device)
168
+ model.load_state_dict(checkpoint['model_state_dict'])
169
+ model.eval()
170
+
171
+ print("Model loaded successfully!")
172
+ ```
173
+
174
+ ### Inference Example
175
+
176
+ ```python
177
+ from torchvision import transforms
178
+ from PIL import Image
179
+
180
+ # Prepare image
181
+ transform = transforms.Compose([
182
+ transforms.Resize((128, 128)),
183
+ transforms.ToTensor(),
184
+ transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
185
+ ])
186
+
187
+ image = Image.open("your_image.jpg").convert('RGB')
188
+ input_tensor = transform(image).unsqueeze(0).to(device)
189
+
190
+ # Get reconstruction
191
+ with torch.no_grad():
192
+ reconstructed, latent = model(input_tensor)
193
+
194
+ # Denormalize for visualization
195
+ reconstructed = (reconstructed * 0.5) + 0.5
196
+ ```
197
+
198
+ ## Reconstruction Examples
199
+
200
+ ![Reconstruction Comparison](reconstruction_comparison.png)
201
+
202
+ The image above shows 10 original CIFAR-10 test images (top row) and their reconstructions (bottom row), demonstrating the model's excellent reconstruction quality.
203
+
204
+ ## Applications
205
+
206
+ - **Deepfake Detection**: Use reconstruction error as a signal for detecting manipulated images
207
+ - **Anomaly Detection**: Identify out-of-distribution images based on reconstruction quality
208
+ - **Image Compression**: Compress images to 512-dimensional latent vectors
209
+ - **Feature Extraction**: Use the encoder as a feature extractor for downstream tasks
210
+ - **Image Denoising**: Potential for removing noise through reconstruction
211
+
212
+ ## Limitations
213
+
214
+ - Trained specifically on CIFAR-10 (32x32 images upscaled to 128x128)
215
+ - May not generalize well to real-world high-resolution images without fine-tuning
216
+ - Optimized for natural images; performance on synthetic/generated images varies
217
+ - Reconstruction quality degrades for images significantly different from CIFAR-10 distribution
218
+
219
+ ## Citation
220
+
221
+ If you use this model in your research, please cite:
222
+
223
+ ```bibtex
224
+ @misc{deepfake-autoencoder-cifar10-v2,
225
+ author = {ash12321},
226
+ title = {Residual Convolutional Autoencoder for Deepfake Detection},
227
+ year = {2024},
228
+ publisher = {HuggingFace},
229
+ howpublished = {\url{https://huggingface.co/ash12321/deepfake-autoencoder-cifar10-v2}}
230
+ }
231
+ ```
232
+
233
+ ## Model Card Authors
234
+
235
+ - **ash12321**
236
+
237
+ ## Model Card Contact
238
+
239
+ For questions or issues, please open an issue in the repository.
240
+
241
+ ---
242
+
243
+ *Model trained on December 08, 2025*