File size: 6,995 Bytes
84a1c0c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
---
tags:
- image-classification
- fake-detection
- anomaly-detection
- one-class-learning
- deepfake-detection
- computer-vision
license: mit
---
# π― Fake Image Detection Ensemble (9 Models)
A powerful ensemble of 9 specialized models trained for detecting fake/AI-generated images using **single-class anomaly detection**. Trained only on real images to learn what "normal" looks like, then detects fakes as anomalies.
## π Performance
| Metric | Score |
|--------|-------|
| **Accuracy** | 67.05% |
| **Precision** | 87.97% |
| **Recall** | 39.50% |
| **F1 Score** | 54.52% |
### Confusion Matrix
- True Negatives: 946 (real correctly identified)
- False Positives: 54 (real misclassified as fake)
- False Negatives: 605 (fake misclassified as real)
- True Positives: 395 (fake correctly identified)
## ποΈ Architecture
The ensemble combines 9 specialized models using different detection strategies:
### Deep Learning Models (3):
1. **Enhanced Frequency VAE** - Multi-scale frequency analysis with phase information
- Uses both magnitude and phase of FFT
- Spectral consistency loss
- Detects frequency-domain artifacts
2. **Edge Normalizing Flow** - Probability density estimation on edge features
- Multi-scale edge analysis
- Normalizing flow architecture
- Detects unnatural edge patterns
3. **Semantic Deep SVDD** - ResNet50-based hypersphere anomaly detection
- Semantic feature extraction
- One-class deep learning
- Detects high-level semantic anomalies
### Traditional ML Models (6):
4. **Texture One-Class SVM** - Boundary-based detection
- Enhanced texture features
- RBF kernel
- Tight decision boundary (nu=0.03)
5. **Isolation Forest** - Isolation-based anomaly detection
- 200 estimators
- Frequency + spatial features
- Fast inference
6. **Local Outlier Factor** - Local density anomalies
- Multi-scale patch analysis
- Novelty detection mode
- 20 neighbors
7. **Gaussian Mixture Model** - Distribution modeling
- 10 components
- Full covariance
- Color distribution analysis
8. **Color Distribution Model** - Statistical color analysis
- RGB histograms
- Mahalanobis distance
- Color moment analysis
9. **Statistical Model** - Edge and color statistics
- Sobel edge detection
- Multi-scale analysis
- Mahalanobis distance
## π Training Details
- **Training Data**: 30,000 real images from COCO dataset
- **Training Approach**: Single-class anomaly detection (NO fake images used)
- **Validation Split**: 20% (6,000 images)
- **Test Set**: 1,000 real + 1,000 fake images (completely separate)
- **Training Time**: ~5-6 hours on GPU
- **Ensemble Method**: Weighted voting with adaptive threshold
### Model Training Times (Extended):
- Enhanced Frequency VAE: 45 minutes
- Texture One-Class SVM: 45 minutes
- Color Distribution Model: 30 minutes
- Edge Normalizing Flow: 45 minutes
- Semantic Deep SVDD: 45 minutes
- Statistical Model: 30 minutes
- Isolation Forest: 30 minutes
- Local Outlier Factor: 35 minutes
- Gaussian Mixture Model: 30 minutes
## π Quick Start
```python
import torch
from torchvision import transforms
from PIL import Image
import pickle
import json
from huggingface_hub import hf_hub_download
# Configuration
repo_id = "ash12321/fake-image-detection-ensemble"
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# Download and load config
config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
with open(config_path, 'r') as f:
config = json.load(f)
# Load models (you need the model class definitions)
# Example for one model:
vae_path = hf_hub_download(repo_id=repo_id, filename="freq_vae.pth")
# freq_vae = EnhancedFreqVAE()
# freq_vae.load_state_dict(torch.load(vae_path, map_location=device))
# freq_vae.to(device)
# Load all other models similarly...
# Predict on new image
img = Image.open('test_image.jpg')
img = img.resize((256, 256), Image.LANCZOS).convert('RGB')
tfm = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize([0.485,0.456,0.406], [0.229,0.224,0.225])
])
img_tensor = tfm(img)
# Get prediction from ensemble
is_fake, score, individual_scores = ensemble.predict(img_tensor, device)
print(f"Prediction: {'FAKE' if is_fake else 'REAL'}")
print(f"Anomaly Score: {score:.4f}")
print(f"Individual model scores: {individual_scores}")
```
## π¦ Model Files
| File | Description | Size |
|------|-------------|------|
| `freq_vae.pth` | Enhanced Frequency VAE weights | ~100 MB |
| `semantic_svdd.pth` | Semantic Deep SVDD weights | ~90 MB |
| `edge_flow.pth` | Edge Normalizing Flow weights | ~5 MB |
| `texture_ocsvm.pkl` | Texture One-Class SVM | ~200 MB |
| `iforest.pkl` | Isolation Forest | ~150 MB |
| `lof.pkl` | Local Outlier Factor | ~180 MB |
| `gmm.pkl` | Gaussian Mixture Model | ~50 MB |
| `color_model.pkl` | Color Distribution Model | ~10 MB |
| `stat.pkl` | Statistical Model | ~5 MB |
| `config.json` | Ensemble configuration | <1 MB |
| `results_summary.json` | Training metrics | <1 MB |
## π§ Requirements
```
torch>=2.0.0
torchvision>=0.15.0
numpy>=1.24.0
pillow>=9.0.0
scikit-learn>=1.3.0
scipy>=1.10.0
huggingface_hub>=0.19.0
```
## π― Use Cases
- **Deepfake Detection**: Identify AI-generated faces
- **Image Forensics**: Detect manipulated images
- **Content Moderation**: Filter synthetic content
- **Research**: Study AI-generated image characteristics
- **Quality Control**: Verify image authenticity
## β οΈ Limitations
- Trained on COCO real images - performance may vary on other domains
- Requires 256Γ256 input resolution
- May struggle with heavily compressed or low-quality images
- Performance depends on similarity between training and test distributions
- Not designed for adversarial attacks
## π Model Improvements
This version includes several accuracy enhancements:
1. **Phase Information**: VAE uses both magnitude and phase of FFT
2. **Enhanced Features**: More comprehensive texture and edge features
3. **Adaptive Threshold**: Auto-calibrated at 95th percentile
4. **Optimized Weights**: Balanced ensemble voting
5. **Extended Training**: Up to 45 minutes per model for better convergence
## π Citation
```bibtex
@misc{fake-detection-ensemble-2024,
author = {ash12321},
title = {Fake Image Detection Ensemble - 9 Model System},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ash12321/fake-image-detection-ensemble}}
}
```
## π License
MIT License - Free for research and commercial use
## π Acknowledgments
- COCO Dataset for training data
- PyTorch and scikit-learn communities
- Hugging Face for model hosting
## π Contact
Questions? Issues? Open an issue or discussion on this repository!
---
**Note**: This model was trained using single-class learning, making it robust to new types of fake images not seen during training. The ensemble approach combines multiple detection strategies for maximum accuracy and reliability.
|