fake-image-detection-ensemble / README.md

Upload README.md with huggingface_hub

84a1c0c verified 16 days ago

7 kB

	---
	tags:
	- image-classification
	- fake-detection
	- anomaly-detection
	- one-class-learning
	- deepfake-detection
	- computer-vision
	license: mit
	---

	# 🎯 Fake Image Detection Ensemble (9 Models)

	A powerful ensemble of 9 specialized models trained for detecting fake/AI-generated images using single-class anomaly detection. Trained only on real images to learn what "normal" looks like, then detects fakes as anomalies.

	## 📊 Performance

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| 67.05% \|
	\| Precision \| 87.97% \|
	\| Recall \| 39.50% \|
	\| F1 Score \| 54.52% \|

	### Confusion Matrix
	- True Negatives: 946 (real correctly identified)
	- False Positives: 54 (real misclassified as fake)
	- False Negatives: 605 (fake misclassified as real)
	- True Positives: 395 (fake correctly identified)

	## 🏗️ Architecture

	The ensemble combines 9 specialized models using different detection strategies:

	### Deep Learning Models (3):
	1. Enhanced Frequency VAE - Multi-scale frequency analysis with phase information
	- Uses both magnitude and phase of FFT
	- Spectral consistency loss
	- Detects frequency-domain artifacts

	2. Edge Normalizing Flow - Probability density estimation on edge features
	- Multi-scale edge analysis
	- Normalizing flow architecture
	- Detects unnatural edge patterns

	3. Semantic Deep SVDD - ResNet50-based hypersphere anomaly detection
	- Semantic feature extraction
	- One-class deep learning
	- Detects high-level semantic anomalies

	### Traditional ML Models (6):
	4. Texture One-Class SVM - Boundary-based detection
	- Enhanced texture features
	- RBF kernel
	- Tight decision boundary (nu=0.03)

	5. Isolation Forest - Isolation-based anomaly detection
	- 200 estimators
	- Frequency + spatial features
	- Fast inference

	6. Local Outlier Factor - Local density anomalies
	- Multi-scale patch analysis
	- Novelty detection mode
	- 20 neighbors

	7. Gaussian Mixture Model - Distribution modeling
	- 10 components
	- Full covariance
	- Color distribution analysis

	8. Color Distribution Model - Statistical color analysis
	- RGB histograms
	- Mahalanobis distance
	- Color moment analysis

	9. Statistical Model - Edge and color statistics
	- Sobel edge detection
	- Multi-scale analysis
	- Mahalanobis distance

	## 🎓 Training Details

	- Training Data: 30,000 real images from COCO dataset
	- Training Approach: Single-class anomaly detection (NO fake images used)
	- Validation Split: 20% (6,000 images)
	- Test Set: 1,000 real + 1,000 fake images (completely separate)
	- Training Time: ~5-6 hours on GPU
	- Ensemble Method: Weighted voting with adaptive threshold

	### Model Training Times (Extended):
	- Enhanced Frequency VAE: 45 minutes
	- Texture One-Class SVM: 45 minutes
	- Color Distribution Model: 30 minutes
	- Edge Normalizing Flow: 45 minutes
	- Semantic Deep SVDD: 45 minutes
	- Statistical Model: 30 minutes
	- Isolation Forest: 30 minutes
	- Local Outlier Factor: 35 minutes
	- Gaussian Mixture Model: 30 minutes

	## 🚀 Quick Start

	```python
	import torch
	from torchvision import transforms
	from PIL import Image
	import pickle
	import json
	from huggingface_hub import hf_hub_download

	# Configuration
	repo_id = "ash12321/fake-image-detection-ensemble"
	device = 'cuda' if torch.cuda.is_available() else 'cpu'

	# Download and load config
	config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
	with open(config_path, 'r') as f:
	config = json.load(f)

	# Load models (you need the model class definitions)
	# Example for one model:
	vae_path = hf_hub_download(repo_id=repo_id, filename="freq_vae.pth")
	# freq_vae = EnhancedFreqVAE()
	# freq_vae.load_state_dict(torch.load(vae_path, map_location=device))
	# freq_vae.to(device)

	# Load all other models similarly...

	# Predict on new image
	img = Image.open('test_image.jpg')
	img = img.resize((256, 256), Image.LANCZOS).convert('RGB')

	tfm = transforms.Compose([
	transforms.ToTensor(),
	transforms.Normalize([0.485,0.456,0.406], [0.229,0.224,0.225])
	])
	img_tensor = tfm(img)

	# Get prediction from ensemble
	is_fake, score, individual_scores = ensemble.predict(img_tensor, device)
	print(f"Prediction: {'FAKE' if is_fake else 'REAL'}")
	print(f"Anomaly Score: {score:.4f}")
	print(f"Individual model scores: {individual_scores}")
	```

	## 📦 Model Files

	\| File \| Description \| Size \|
	\|------\|-------------\|------\|
	\| `freq_vae.pth` \| Enhanced Frequency VAE weights \| ~100 MB \|
	\| `semantic_svdd.pth` \| Semantic Deep SVDD weights \| ~90 MB \|
	\| `edge_flow.pth` \| Edge Normalizing Flow weights \| ~5 MB \|
	\| `texture_ocsvm.pkl` \| Texture One-Class SVM \| ~200 MB \|
	\| `iforest.pkl` \| Isolation Forest \| ~150 MB \|
	\| `lof.pkl` \| Local Outlier Factor \| ~180 MB \|
	\| `gmm.pkl` \| Gaussian Mixture Model \| ~50 MB \|
	\| `color_model.pkl` \| Color Distribution Model \| ~10 MB \|
	\| `stat.pkl` \| Statistical Model \| ~5 MB \|
	\| `config.json` \| Ensemble configuration \| <1 MB \|
	\| `results_summary.json` \| Training metrics \| <1 MB \|

	## 🔧 Requirements

	```
	torch>=2.0.0
	torchvision>=0.15.0
	numpy>=1.24.0
	pillow>=9.0.0
	scikit-learn>=1.3.0
	scipy>=1.10.0
	huggingface_hub>=0.19.0
	```

	## 🎯 Use Cases

	- Deepfake Detection: Identify AI-generated faces
	- Image Forensics: Detect manipulated images
	- Content Moderation: Filter synthetic content
	- Research: Study AI-generated image characteristics
	- Quality Control: Verify image authenticity

	## ⚠️ Limitations

	- Trained on COCO real images - performance may vary on other domains
	- Requires 256×256 input resolution
	- May struggle with heavily compressed or low-quality images
	- Performance depends on similarity between training and test distributions
	- Not designed for adversarial attacks

	## 📈 Model Improvements

	This version includes several accuracy enhancements:

	1. Phase Information: VAE uses both magnitude and phase of FFT
	2. Enhanced Features: More comprehensive texture and edge features
	3. Adaptive Threshold: Auto-calibrated at 95th percentile
	4. Optimized Weights: Balanced ensemble voting
	5. Extended Training: Up to 45 minutes per model for better convergence

	## 📝 Citation

	```bibtex
	@misc{fake-detection-ensemble-2024,
	author = {ash12321},
	title = {Fake Image Detection Ensemble - 9 Model System},
	year = {2024},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/ash12321/fake-image-detection-ensemble}}
	}
	```

	## 📄 License

	MIT License - Free for research and commercial use

	## 🙏 Acknowledgments

	- COCO Dataset for training data
	- PyTorch and scikit-learn communities
	- Hugging Face for model hosting

	## 📞 Contact

	Questions? Issues? Open an issue or discussion on this repository!

	---

	Note: This model was trained using single-class learning, making it robust to new types of fake images not seen during training. The ensemble approach combines multiple detection strategies for maximum accuracy and reliability.