|
|
--- |
|
|
tags: |
|
|
- image-classification |
|
|
- fake-detection |
|
|
- anomaly-detection |
|
|
- one-class-learning |
|
|
- deepfake-detection |
|
|
- computer-vision |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# π― Fake Image Detection Ensemble (9 Models) |
|
|
|
|
|
A powerful ensemble of 9 specialized models trained for detecting fake/AI-generated images using **single-class anomaly detection**. Trained only on real images to learn what "normal" looks like, then detects fakes as anomalies. |
|
|
|
|
|
## π Performance |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|-------| |
|
|
| **Accuracy** | 67.05% | |
|
|
| **Precision** | 87.97% | |
|
|
| **Recall** | 39.50% | |
|
|
| **F1 Score** | 54.52% | |
|
|
|
|
|
### Confusion Matrix |
|
|
- True Negatives: 946 (real correctly identified) |
|
|
- False Positives: 54 (real misclassified as fake) |
|
|
- False Negatives: 605 (fake misclassified as real) |
|
|
- True Positives: 395 (fake correctly identified) |
|
|
|
|
|
## ποΈ Architecture |
|
|
|
|
|
The ensemble combines 9 specialized models using different detection strategies: |
|
|
|
|
|
### Deep Learning Models (3): |
|
|
1. **Enhanced Frequency VAE** - Multi-scale frequency analysis with phase information |
|
|
- Uses both magnitude and phase of FFT |
|
|
- Spectral consistency loss |
|
|
- Detects frequency-domain artifacts |
|
|
|
|
|
2. **Edge Normalizing Flow** - Probability density estimation on edge features |
|
|
- Multi-scale edge analysis |
|
|
- Normalizing flow architecture |
|
|
- Detects unnatural edge patterns |
|
|
|
|
|
3. **Semantic Deep SVDD** - ResNet50-based hypersphere anomaly detection |
|
|
- Semantic feature extraction |
|
|
- One-class deep learning |
|
|
- Detects high-level semantic anomalies |
|
|
|
|
|
### Traditional ML Models (6): |
|
|
4. **Texture One-Class SVM** - Boundary-based detection |
|
|
- Enhanced texture features |
|
|
- RBF kernel |
|
|
- Tight decision boundary (nu=0.03) |
|
|
|
|
|
5. **Isolation Forest** - Isolation-based anomaly detection |
|
|
- 200 estimators |
|
|
- Frequency + spatial features |
|
|
- Fast inference |
|
|
|
|
|
6. **Local Outlier Factor** - Local density anomalies |
|
|
- Multi-scale patch analysis |
|
|
- Novelty detection mode |
|
|
- 20 neighbors |
|
|
|
|
|
7. **Gaussian Mixture Model** - Distribution modeling |
|
|
- 10 components |
|
|
- Full covariance |
|
|
- Color distribution analysis |
|
|
|
|
|
8. **Color Distribution Model** - Statistical color analysis |
|
|
- RGB histograms |
|
|
- Mahalanobis distance |
|
|
- Color moment analysis |
|
|
|
|
|
9. **Statistical Model** - Edge and color statistics |
|
|
- Sobel edge detection |
|
|
- Multi-scale analysis |
|
|
- Mahalanobis distance |
|
|
|
|
|
## π Training Details |
|
|
|
|
|
- **Training Data**: 30,000 real images from COCO dataset |
|
|
- **Training Approach**: Single-class anomaly detection (NO fake images used) |
|
|
- **Validation Split**: 20% (6,000 images) |
|
|
- **Test Set**: 1,000 real + 1,000 fake images (completely separate) |
|
|
- **Training Time**: ~5-6 hours on GPU |
|
|
- **Ensemble Method**: Weighted voting with adaptive threshold |
|
|
|
|
|
### Model Training Times (Extended): |
|
|
- Enhanced Frequency VAE: 45 minutes |
|
|
- Texture One-Class SVM: 45 minutes |
|
|
- Color Distribution Model: 30 minutes |
|
|
- Edge Normalizing Flow: 45 minutes |
|
|
- Semantic Deep SVDD: 45 minutes |
|
|
- Statistical Model: 30 minutes |
|
|
- Isolation Forest: 30 minutes |
|
|
- Local Outlier Factor: 35 minutes |
|
|
- Gaussian Mixture Model: 30 minutes |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from torchvision import transforms |
|
|
from PIL import Image |
|
|
import pickle |
|
|
import json |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Configuration |
|
|
repo_id = "ash12321/fake-image-detection-ensemble" |
|
|
device = 'cuda' if torch.cuda.is_available() else 'cpu' |
|
|
|
|
|
# Download and load config |
|
|
config_path = hf_hub_download(repo_id=repo_id, filename="config.json") |
|
|
with open(config_path, 'r') as f: |
|
|
config = json.load(f) |
|
|
|
|
|
# Load models (you need the model class definitions) |
|
|
# Example for one model: |
|
|
vae_path = hf_hub_download(repo_id=repo_id, filename="freq_vae.pth") |
|
|
# freq_vae = EnhancedFreqVAE() |
|
|
# freq_vae.load_state_dict(torch.load(vae_path, map_location=device)) |
|
|
# freq_vae.to(device) |
|
|
|
|
|
# Load all other models similarly... |
|
|
|
|
|
# Predict on new image |
|
|
img = Image.open('test_image.jpg') |
|
|
img = img.resize((256, 256), Image.LANCZOS).convert('RGB') |
|
|
|
|
|
tfm = transforms.Compose([ |
|
|
transforms.ToTensor(), |
|
|
transforms.Normalize([0.485,0.456,0.406], [0.229,0.224,0.225]) |
|
|
]) |
|
|
img_tensor = tfm(img) |
|
|
|
|
|
# Get prediction from ensemble |
|
|
is_fake, score, individual_scores = ensemble.predict(img_tensor, device) |
|
|
print(f"Prediction: {'FAKE' if is_fake else 'REAL'}") |
|
|
print(f"Anomaly Score: {score:.4f}") |
|
|
print(f"Individual model scores: {individual_scores}") |
|
|
``` |
|
|
|
|
|
## π¦ Model Files |
|
|
|
|
|
| File | Description | Size | |
|
|
|------|-------------|------| |
|
|
| `freq_vae.pth` | Enhanced Frequency VAE weights | ~100 MB | |
|
|
| `semantic_svdd.pth` | Semantic Deep SVDD weights | ~90 MB | |
|
|
| `edge_flow.pth` | Edge Normalizing Flow weights | ~5 MB | |
|
|
| `texture_ocsvm.pkl` | Texture One-Class SVM | ~200 MB | |
|
|
| `iforest.pkl` | Isolation Forest | ~150 MB | |
|
|
| `lof.pkl` | Local Outlier Factor | ~180 MB | |
|
|
| `gmm.pkl` | Gaussian Mixture Model | ~50 MB | |
|
|
| `color_model.pkl` | Color Distribution Model | ~10 MB | |
|
|
| `stat.pkl` | Statistical Model | ~5 MB | |
|
|
| `config.json` | Ensemble configuration | <1 MB | |
|
|
| `results_summary.json` | Training metrics | <1 MB | |
|
|
|
|
|
## π§ Requirements |
|
|
|
|
|
``` |
|
|
torch>=2.0.0 |
|
|
torchvision>=0.15.0 |
|
|
numpy>=1.24.0 |
|
|
pillow>=9.0.0 |
|
|
scikit-learn>=1.3.0 |
|
|
scipy>=1.10.0 |
|
|
huggingface_hub>=0.19.0 |
|
|
``` |
|
|
|
|
|
## π― Use Cases |
|
|
|
|
|
- **Deepfake Detection**: Identify AI-generated faces |
|
|
- **Image Forensics**: Detect manipulated images |
|
|
- **Content Moderation**: Filter synthetic content |
|
|
- **Research**: Study AI-generated image characteristics |
|
|
- **Quality Control**: Verify image authenticity |
|
|
|
|
|
## β οΈ Limitations |
|
|
|
|
|
- Trained on COCO real images - performance may vary on other domains |
|
|
- Requires 256Γ256 input resolution |
|
|
- May struggle with heavily compressed or low-quality images |
|
|
- Performance depends on similarity between training and test distributions |
|
|
- Not designed for adversarial attacks |
|
|
|
|
|
## π Model Improvements |
|
|
|
|
|
This version includes several accuracy enhancements: |
|
|
|
|
|
1. **Phase Information**: VAE uses both magnitude and phase of FFT |
|
|
2. **Enhanced Features**: More comprehensive texture and edge features |
|
|
3. **Adaptive Threshold**: Auto-calibrated at 95th percentile |
|
|
4. **Optimized Weights**: Balanced ensemble voting |
|
|
5. **Extended Training**: Up to 45 minutes per model for better convergence |
|
|
|
|
|
## π Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{fake-detection-ensemble-2024, |
|
|
author = {ash12321}, |
|
|
title = {Fake Image Detection Ensemble - 9 Model System}, |
|
|
year = {2024}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{https://huggingface.co/ash12321/fake-image-detection-ensemble}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## π License |
|
|
|
|
|
MIT License - Free for research and commercial use |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
- COCO Dataset for training data |
|
|
- PyTorch and scikit-learn communities |
|
|
- Hugging Face for model hosting |
|
|
|
|
|
## π Contact |
|
|
|
|
|
Questions? Issues? Open an issue or discussion on this repository! |
|
|
|
|
|
--- |
|
|
|
|
|
**Note**: This model was trained using single-class learning, making it robust to new types of fake images not seen during training. The ensemble approach combines multiple detection strategies for maximum accuracy and reliability. |
|
|
|