Upload folder using huggingface_hub

c37abab verified 3 months ago

9.17 kB

	# DeepFake Detector V13 🎯

	State-of-the-art deepfake detection ensemble with 699M parameters

	[![Model](https://img.shields.io/badge/Model-V13-blue)](https://huggingface.co/ash12321/deepfake-detector-v13)
	[![Parameters](https://img.shields.io/badge/Parameters-699M-green)](https://huggingface.co/ash12321/deepfake-detector-v13)
	[![F1 Score](https://img.shields.io/badge/F1-0.9313-brightgreen)](https://huggingface.co/ash12321/deepfake-detector-v13)

	## 🚀 Performance Highlights

	- Average Ensemble F1: 0.9313
	- Best Model F1: 0.9586 (Model 13.3 - Swin-Large)
	- Total Parameters: 699M (exceeds 500M requirement ✅)
	- Training Time: ~6.1 hours on T4 GPU

	## 📊 Architecture

	This model consists of 3 large-scale transformer and CNN models trained sequentially:

	\| Model \| Backbone \| Parameters \| F1 Score \| Training Time \|
	\|-------\|----------\|------------\|----------\|---------------\|
	\| Model 13.1 \| ConvNeXt-Large \| 198M \| 0.8971 \| 205.7 min \|
	\| Model 13.2 \| ViT-Large \| 304M \| 0.9382 \| 52.7 min \|
	\| Model 13.3 \| Swin-Large \| 197M \| 0.9586 \| 106.2 min \|

	Total: 699M parameters

	### Model Files

	- `model_1.safetensors` - ConvNeXt-Large (752 MB)
	- `model_2.safetensors` - ViT-Large (1159 MB)
	- `model_3.safetensors` - Swin-Large (747 MB)

	## 🎯 Usage

	### Installation

	```bash
	pip install torch torchvision timm safetensors pillow
	```

	### Quick Start - Single Model

	```python
	import torch
	import timm
	from PIL import Image
	from torchvision import transforms
	from safetensors.torch import load_file

	# Define model architecture
	class DeepfakeDetector(torch.nn.Module):
	def __init__(self, backbone_name, dropout=0.3):
	super().__init__()
	self.backbone = timm.create_model(backbone_name, pretrained=False, num_classes=0)

	if hasattr(self.backbone, 'num_features'):
	feat_dim = self.backbone.num_features
	else:
	with torch.no_grad():
	feat_dim = self.backbone(torch.randn(1, 3, 224, 224)).shape[1]

	self.classifier = torch.nn.Sequential(
	torch.nn.Linear(feat_dim, 512),
	torch.nn.BatchNorm1d(512),
	torch.nn.GELU(),
	torch.nn.Dropout(dropout),
	torch.nn.Linear(512, 128),
	torch.nn.BatchNorm1d(128),
	torch.nn.GELU(),
	torch.nn.Dropout(dropout * 0.5),
	torch.nn.Linear(128, 1)
	)

	def forward(self, x):
	features = self.backbone(x)
	return self.classifier(features).squeeze(-1)

	# Load best model (Model 13.3 - Swin-Large)
	device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
	model = DeepfakeDetector('swin_large_patch4_window7_224', dropout=0.3)
	state_dict = load_file('model_3.safetensors')
	model.load_state_dict(state_dict)
	model = model.to(device)
	model.eval()

	# Preprocessing
	transform = transforms.Compose([
	transforms.Resize((224, 224)),
	transforms.ToTensor(),
	transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
	])

	# Predict
	image = Image.open('test_image.jpg').convert('RGB')
	input_tensor = transform(image).unsqueeze(0).to(device)

	with torch.no_grad():
	logits = model(input_tensor)
	probability = torch.sigmoid(logits).item()
	prediction = 'FAKE' if probability > 0.5 else 'REAL'

	print(f"Prediction: {prediction}")
	print(f"Confidence: {probability:.2%}")
	```

	### Full Ensemble (Recommended)

	```python
	import torch
	import timm
	from PIL import Image
	from torchvision import transforms
	from safetensors.torch import load_file

	class DeepfakeDetector(torch.nn.Module):
	def __init__(self, backbone_name, dropout=0.3):
	super().__init__()
	self.backbone = timm.create_model(backbone_name, pretrained=False, num_classes=0)

	if hasattr(self.backbone, 'num_features'):
	feat_dim = self.backbone.num_features
	else:
	with torch.no_grad():
	feat_dim = self.backbone(torch.randn(1, 3, 224, 224)).shape[1]

	self.classifier = torch.nn.Sequential(
	torch.nn.Linear(feat_dim, 512),
	torch.nn.BatchNorm1d(512),
	torch.nn.GELU(),
	torch.nn.Dropout(dropout),
	torch.nn.Linear(512, 128),
	torch.nn.BatchNorm1d(128),
	torch.nn.GELU(),
	torch.nn.Dropout(dropout * 0.5),
	torch.nn.Linear(128, 1)
	)

	def forward(self, x):
	features = self.backbone(x)
	return self.classifier(features).squeeze(-1)

	# Model configurations
	device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

	configs = [
	('convnext_large', 0.3, 'model_1.safetensors'),
	('vit_large_patch16_224', 0.35, 'model_2.safetensors'),
	('swin_large_patch4_window7_224', 0.3, 'model_3.safetensors')
	]

	# Load all models
	models = []
	for backbone, dropout, filename in configs:
	model = DeepfakeDetector(backbone, dropout)
	state_dict = load_file(filename)
	model.load_state_dict(state_dict)
	model = model.to(device)
	model.eval()
	models.append(model)

	print(f"✓ Loaded {len(models)} models")

	# Preprocessing
	transform = transforms.Compose([
	transforms.Resize((224, 224)),
	transforms.ToTensor(),
	transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
	])

	# Ensemble prediction
	def predict_ensemble(image_path):
	image = Image.open(image_path).convert('RGB')
	input_tensor = transform(image).unsqueeze(0).to(device)

	predictions = []
	with torch.no_grad():
	for model in models:
	logits = model(input_tensor)
	prob = torch.sigmoid(logits).item()
	predictions.append(prob)

	# Average ensemble
	avg_prob = sum(predictions) / len(predictions)
	prediction = 'FAKE' if avg_prob > 0.5 else 'REAL'

	return {
	'prediction': prediction,
	'confidence': avg_prob,
	'individual_predictions': predictions
	}

	# Use it
	result = predict_ensemble('test_image.jpg')
	print(f"Prediction: {result['prediction']}")
	print(f"Ensemble Confidence: {result['confidence']:.2%}")
	print(f"Individual Models: {[f'{p:.2%}' for p in result['individual_predictions']]}")
	```

	## 📈 Training Details

	### Architecture Design

	Each model uses:
	- Backbone: Large pre-trained vision model (frozen initially, fine-tuned)
	- Classifier Head:
	- Linear(feat_dim → 512) + BatchNorm + GELU + Dropout
	- Linear(512 → 128) + BatchNorm + GELU + Dropout
	- Linear(128 → 1)

	### Training Configuration

	- Loss Function: Focal Loss with Label Smoothing
	- Alpha: 0.25
	- Gamma: 2.5
	- Label Smoothing: 0.12
	- Optimizer: AdamW
	- Learning Rates: [2e-5, 1.5e-5, 1.8e-5]
	- Weight Decay: 3e-4
	- Scheduler: CosineAnnealingWarmRestarts (T_0=3, T_mult=2)
	- Epochs: 10 per model
	- Batch Sizes: [32, 24, 32]
	- Mixed Precision: FP16 enabled
	- Gradient Accumulation: 4 steps
	- Gradient Checkpointing: Enabled (memory efficiency)

	### Data Augmentation

	- Random Horizontal Flip (p=0.5)
	- Random Rotation (±12°)
	- Color Jitter (brightness, contrast, saturation: ±0.15)
	- Normalization: ImageNet stats

	## 📊 Performance Analysis

	### Model Comparison

	Model 13.1 (ConvNeXt-Large)
	- ✓ Solid baseline: F1 = 0.8971
	- ✓ CNN-based architecture
	- ✓ Good for local feature extraction

	Model 13.2 (ViT-Large)
	- ✓ Strong performance: F1 = 0.9382
	- ✓ Fastest training (52.7 min)
	- ✓ Global attention mechanism

	Model 13.3 (Swin-Large) ⭐ Best Model
	- ✓ Excellent performance: F1 = 0.9586
	- ✓ Hierarchical vision transformer
	- ✓ Best balance of accuracy and efficiency

	### Ensemble Benefits

	The ensemble approach provides:
	- Improved Robustness: Different architectures capture different patterns
	- Reduced Variance: Averaging reduces prediction noise
	- Better Generalization: Complementary strengths minimize overfitting
	- Higher Accuracy: Expected ensemble F1 ≈ 0.94-0.96

	## 🔧 System Requirements

	Inference (Single Model)
	- GPU: 4GB+ VRAM
	- RAM: 8GB+
	- Storage: ~1.2 GB per model

	Inference (Full Ensemble)
	- GPU: 12GB+ VRAM (or run models sequentially on smaller GPU)
	- RAM: 16GB+
	- Storage: ~2.7 GB total

	Training
	- GPU: T4 (16GB) or better
	- RAM: 12GB+
	- Storage: 8GB+ for checkpoints

	## 📚 Dataset

	Trained on: [`ash12321/deepfake-v13-dataset`](https://huggingface.co/datasets/ash12321/deepfake-v13-dataset)

	## 🔗 Related Models

	- Predecessor: [`ash12321/deepfake-detector-v12`](https://huggingface.co/ash12321/deepfake-detector-v12)

	## 📄 Citation

	```bibtex
	@model{v13-deepfake-detector,
	title={DeepFake Detector V13: Large-Scale Ensemble},
	author={Ash},
	year={2024},
	publisher={Hugging Face},
	howpublished={\url{https://huggingface.co/ash12321/deepfake-detector-v13}}
	}
	```

	## 📝 License

	MIT License - See LICENSE file for details

	## 🙏 Acknowledgments

	- Built with PyTorch, timm, and Hugging Face
	- Trained on Google Colab T4 GPU
	- Architectures: ConvNeXt (Meta), ViT (Google), Swin (Microsoft)

	---

	Model Version: 13.0
	Last Updated: November 2024
	Status: Production Ready ✅