File size: 6,995 Bytes
84a1c0c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
---
tags:
- image-classification
- fake-detection
- anomaly-detection
- one-class-learning
- deepfake-detection
- computer-vision
license: mit
---

# 🎯 Fake Image Detection Ensemble (9 Models)

A powerful ensemble of 9 specialized models trained for detecting fake/AI-generated images using **single-class anomaly detection**. Trained only on real images to learn what "normal" looks like, then detects fakes as anomalies.

## πŸ“Š Performance

| Metric | Score |
|--------|-------|
| **Accuracy** | 67.05% |
| **Precision** | 87.97% |
| **Recall** | 39.50% |
| **F1 Score** | 54.52% |

### Confusion Matrix
- True Negatives: 946 (real correctly identified)
- False Positives: 54 (real misclassified as fake)
- False Negatives: 605 (fake misclassified as real)
- True Positives: 395 (fake correctly identified)

## πŸ—οΈ Architecture

The ensemble combines 9 specialized models using different detection strategies:

### Deep Learning Models (3):
1. **Enhanced Frequency VAE** - Multi-scale frequency analysis with phase information
   - Uses both magnitude and phase of FFT
   - Spectral consistency loss
   - Detects frequency-domain artifacts

2. **Edge Normalizing Flow** - Probability density estimation on edge features
   - Multi-scale edge analysis
   - Normalizing flow architecture
   - Detects unnatural edge patterns

3. **Semantic Deep SVDD** - ResNet50-based hypersphere anomaly detection
   - Semantic feature extraction
   - One-class deep learning
   - Detects high-level semantic anomalies

### Traditional ML Models (6):
4. **Texture One-Class SVM** - Boundary-based detection
   - Enhanced texture features
   - RBF kernel
   - Tight decision boundary (nu=0.03)

5. **Isolation Forest** - Isolation-based anomaly detection
   - 200 estimators
   - Frequency + spatial features
   - Fast inference

6. **Local Outlier Factor** - Local density anomalies
   - Multi-scale patch analysis
   - Novelty detection mode
   - 20 neighbors

7. **Gaussian Mixture Model** - Distribution modeling
   - 10 components
   - Full covariance
   - Color distribution analysis

8. **Color Distribution Model** - Statistical color analysis
   - RGB histograms
   - Mahalanobis distance
   - Color moment analysis

9. **Statistical Model** - Edge and color statistics
   - Sobel edge detection
   - Multi-scale analysis
   - Mahalanobis distance

## πŸŽ“ Training Details

- **Training Data**: 30,000 real images from COCO dataset
- **Training Approach**: Single-class anomaly detection (NO fake images used)
- **Validation Split**: 20% (6,000 images)
- **Test Set**: 1,000 real + 1,000 fake images (completely separate)
- **Training Time**: ~5-6 hours on GPU
- **Ensemble Method**: Weighted voting with adaptive threshold

### Model Training Times (Extended):
- Enhanced Frequency VAE: 45 minutes
- Texture One-Class SVM: 45 minutes
- Color Distribution Model: 30 minutes
- Edge Normalizing Flow: 45 minutes
- Semantic Deep SVDD: 45 minutes
- Statistical Model: 30 minutes
- Isolation Forest: 30 minutes
- Local Outlier Factor: 35 minutes
- Gaussian Mixture Model: 30 minutes

## πŸš€ Quick Start

```python
import torch
from torchvision import transforms
from PIL import Image
import pickle
import json
from huggingface_hub import hf_hub_download

# Configuration
repo_id = "ash12321/fake-image-detection-ensemble"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Download and load config
config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
with open(config_path, 'r') as f:
    config = json.load(f)

# Load models (you need the model class definitions)
# Example for one model:
vae_path = hf_hub_download(repo_id=repo_id, filename="freq_vae.pth")
# freq_vae = EnhancedFreqVAE()
# freq_vae.load_state_dict(torch.load(vae_path, map_location=device))
# freq_vae.to(device)

# Load all other models similarly...

# Predict on new image
img = Image.open('test_image.jpg')
img = img.resize((256, 256), Image.LANCZOS).convert('RGB')

tfm = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406], [0.229,0.224,0.225])
])
img_tensor = tfm(img)

# Get prediction from ensemble
is_fake, score, individual_scores = ensemble.predict(img_tensor, device)
print(f"Prediction: {'FAKE' if is_fake else 'REAL'}")
print(f"Anomaly Score: {score:.4f}")
print(f"Individual model scores: {individual_scores}")
```

## πŸ“¦ Model Files

| File | Description | Size |
|------|-------------|------|
| `freq_vae.pth` | Enhanced Frequency VAE weights | ~100 MB |
| `semantic_svdd.pth` | Semantic Deep SVDD weights | ~90 MB |
| `edge_flow.pth` | Edge Normalizing Flow weights | ~5 MB |
| `texture_ocsvm.pkl` | Texture One-Class SVM | ~200 MB |
| `iforest.pkl` | Isolation Forest | ~150 MB |
| `lof.pkl` | Local Outlier Factor | ~180 MB |
| `gmm.pkl` | Gaussian Mixture Model | ~50 MB |
| `color_model.pkl` | Color Distribution Model | ~10 MB |
| `stat.pkl` | Statistical Model | ~5 MB |
| `config.json` | Ensemble configuration | <1 MB |
| `results_summary.json` | Training metrics | <1 MB |

## πŸ”§ Requirements

```
torch>=2.0.0
torchvision>=0.15.0
numpy>=1.24.0
pillow>=9.0.0
scikit-learn>=1.3.0
scipy>=1.10.0
huggingface_hub>=0.19.0
```

## 🎯 Use Cases

- **Deepfake Detection**: Identify AI-generated faces
- **Image Forensics**: Detect manipulated images
- **Content Moderation**: Filter synthetic content
- **Research**: Study AI-generated image characteristics
- **Quality Control**: Verify image authenticity

## ⚠️ Limitations

- Trained on COCO real images - performance may vary on other domains
- Requires 256Γ—256 input resolution
- May struggle with heavily compressed or low-quality images
- Performance depends on similarity between training and test distributions
- Not designed for adversarial attacks

## πŸ“ˆ Model Improvements

This version includes several accuracy enhancements:

1. **Phase Information**: VAE uses both magnitude and phase of FFT
2. **Enhanced Features**: More comprehensive texture and edge features
3. **Adaptive Threshold**: Auto-calibrated at 95th percentile
4. **Optimized Weights**: Balanced ensemble voting
5. **Extended Training**: Up to 45 minutes per model for better convergence

## πŸ“ Citation

```bibtex
@misc{fake-detection-ensemble-2024,
  author = {ash12321},
  title = {Fake Image Detection Ensemble - 9 Model System},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ash12321/fake-image-detection-ensemble}}
}
```

## πŸ“„ License

MIT License - Free for research and commercial use

## πŸ™ Acknowledgments

- COCO Dataset for training data
- PyTorch and scikit-learn communities
- Hugging Face for model hosting

## πŸ“ž Contact

Questions? Issues? Open an issue or discussion on this repository!

---

**Note**: This model was trained using single-class learning, making it robust to new types of fake images not seen during training. The ensemble approach combines multiple detection strategies for maximum accuracy and reliability.