# DeepFake Detector V13 🎯 **State-of-the-art deepfake detection ensemble with 699M parameters** [![Model](https://img.shields.io/badge/Model-V13-blue)](https://huggingface.co/ash12321/deepfake-detector-v13) [![Parameters](https://img.shields.io/badge/Parameters-699M-green)](https://huggingface.co/ash12321/deepfake-detector-v13) [![F1 Score](https://img.shields.io/badge/F1-0.9313-brightgreen)](https://huggingface.co/ash12321/deepfake-detector-v13) ## 🚀 Performance Highlights - **Average Ensemble F1**: 0.9313 - **Best Model F1**: 0.9586 (Model 13.3 - Swin-Large) - **Total Parameters**: 699M (exceeds 500M requirement ✅) - **Training Time**: ~6.1 hours on T4 GPU ## 📊 Architecture This model consists of 3 large-scale transformer and CNN models trained sequentially: | Model | Backbone | Parameters | F1 Score | Training Time | |-------|----------|------------|----------|---------------| | **Model 13.1** | ConvNeXt-Large | 198M | 0.8971 | 205.7 min | | **Model 13.2** | ViT-Large | 304M | 0.9382 | 52.7 min | | **Model 13.3** | Swin-Large | 197M | **0.9586** | 106.2 min | **Total: 699M parameters** ### Model Files - `model_1.safetensors` - ConvNeXt-Large (752 MB) - `model_2.safetensors` - ViT-Large (1159 MB) - `model_3.safetensors` - Swin-Large (747 MB) ## 🎯 Usage ### Installation ```bash pip install torch torchvision timm safetensors pillow ``` ### Quick Start - Single Model ```python import torch import timm from PIL import Image from torchvision import transforms from safetensors.torch import load_file # Define model architecture class DeepfakeDetector(torch.nn.Module): def __init__(self, backbone_name, dropout=0.3): super().__init__() self.backbone = timm.create_model(backbone_name, pretrained=False, num_classes=0) if hasattr(self.backbone, 'num_features'): feat_dim = self.backbone.num_features else: with torch.no_grad(): feat_dim = self.backbone(torch.randn(1, 3, 224, 224)).shape[1] self.classifier = torch.nn.Sequential( torch.nn.Linear(feat_dim, 512), torch.nn.BatchNorm1d(512), torch.nn.GELU(), torch.nn.Dropout(dropout), torch.nn.Linear(512, 128), torch.nn.BatchNorm1d(128), torch.nn.GELU(), torch.nn.Dropout(dropout * 0.5), torch.nn.Linear(128, 1) ) def forward(self, x): features = self.backbone(x) return self.classifier(features).squeeze(-1) # Load best model (Model 13.3 - Swin-Large) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = DeepfakeDetector('swin_large_patch4_window7_224', dropout=0.3) state_dict = load_file('model_3.safetensors') model.load_state_dict(state_dict) model = model.to(device) model.eval() # Preprocessing transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) # Predict image = Image.open('test_image.jpg').convert('RGB') input_tensor = transform(image).unsqueeze(0).to(device) with torch.no_grad(): logits = model(input_tensor) probability = torch.sigmoid(logits).item() prediction = 'FAKE' if probability > 0.5 else 'REAL' print(f"Prediction: {prediction}") print(f"Confidence: {probability:.2%}") ``` ### Full Ensemble (Recommended) ```python import torch import timm from PIL import Image from torchvision import transforms from safetensors.torch import load_file class DeepfakeDetector(torch.nn.Module): def __init__(self, backbone_name, dropout=0.3): super().__init__() self.backbone = timm.create_model(backbone_name, pretrained=False, num_classes=0) if hasattr(self.backbone, 'num_features'): feat_dim = self.backbone.num_features else: with torch.no_grad(): feat_dim = self.backbone(torch.randn(1, 3, 224, 224)).shape[1] self.classifier = torch.nn.Sequential( torch.nn.Linear(feat_dim, 512), torch.nn.BatchNorm1d(512), torch.nn.GELU(), torch.nn.Dropout(dropout), torch.nn.Linear(512, 128), torch.nn.BatchNorm1d(128), torch.nn.GELU(), torch.nn.Dropout(dropout * 0.5), torch.nn.Linear(128, 1) ) def forward(self, x): features = self.backbone(x) return self.classifier(features).squeeze(-1) # Model configurations device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') configs = [ ('convnext_large', 0.3, 'model_1.safetensors'), ('vit_large_patch16_224', 0.35, 'model_2.safetensors'), ('swin_large_patch4_window7_224', 0.3, 'model_3.safetensors') ] # Load all models models = [] for backbone, dropout, filename in configs: model = DeepfakeDetector(backbone, dropout) state_dict = load_file(filename) model.load_state_dict(state_dict) model = model.to(device) model.eval() models.append(model) print(f"✓ Loaded {len(models)} models") # Preprocessing transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) # Ensemble prediction def predict_ensemble(image_path): image = Image.open(image_path).convert('RGB') input_tensor = transform(image).unsqueeze(0).to(device) predictions = [] with torch.no_grad(): for model in models: logits = model(input_tensor) prob = torch.sigmoid(logits).item() predictions.append(prob) # Average ensemble avg_prob = sum(predictions) / len(predictions) prediction = 'FAKE' if avg_prob > 0.5 else 'REAL' return { 'prediction': prediction, 'confidence': avg_prob, 'individual_predictions': predictions } # Use it result = predict_ensemble('test_image.jpg') print(f"Prediction: {result['prediction']}") print(f"Ensemble Confidence: {result['confidence']:.2%}") print(f"Individual Models: {[f'{p:.2%}' for p in result['individual_predictions']]}") ``` ## 📈 Training Details ### Architecture Design Each model uses: - **Backbone**: Large pre-trained vision model (frozen initially, fine-tuned) - **Classifier Head**: - Linear(feat_dim → 512) + BatchNorm + GELU + Dropout - Linear(512 → 128) + BatchNorm + GELU + Dropout - Linear(128 → 1) ### Training Configuration - **Loss Function**: Focal Loss with Label Smoothing - Alpha: 0.25 - Gamma: 2.5 - Label Smoothing: 0.12 - **Optimizer**: AdamW - Learning Rates: [2e-5, 1.5e-5, 1.8e-5] - Weight Decay: 3e-4 - **Scheduler**: CosineAnnealingWarmRestarts (T_0=3, T_mult=2) - **Epochs**: 10 per model - **Batch Sizes**: [32, 24, 32] - **Mixed Precision**: FP16 enabled - **Gradient Accumulation**: 4 steps - **Gradient Checkpointing**: Enabled (memory efficiency) ### Data Augmentation - Random Horizontal Flip (p=0.5) - Random Rotation (±12°) - Color Jitter (brightness, contrast, saturation: ±0.15) - Normalization: ImageNet stats ## 📊 Performance Analysis ### Model Comparison **Model 13.1 (ConvNeXt-Large)** - ✓ Solid baseline: F1 = 0.8971 - ✓ CNN-based architecture - ✓ Good for local feature extraction **Model 13.2 (ViT-Large)** - ✓ Strong performance: F1 = 0.9382 - ✓ Fastest training (52.7 min) - ✓ Global attention mechanism **Model 13.3 (Swin-Large)** ⭐ **Best Model** - ✓ Excellent performance: F1 = 0.9586 - ✓ Hierarchical vision transformer - ✓ Best balance of accuracy and efficiency ### Ensemble Benefits The ensemble approach provides: - **Improved Robustness**: Different architectures capture different patterns - **Reduced Variance**: Averaging reduces prediction noise - **Better Generalization**: Complementary strengths minimize overfitting - **Higher Accuracy**: Expected ensemble F1 ≈ 0.94-0.96 ## 🔧 System Requirements **Inference (Single Model)** - GPU: 4GB+ VRAM - RAM: 8GB+ - Storage: ~1.2 GB per model **Inference (Full Ensemble)** - GPU: 12GB+ VRAM (or run models sequentially on smaller GPU) - RAM: 16GB+ - Storage: ~2.7 GB total **Training** - GPU: T4 (16GB) or better - RAM: 12GB+ - Storage: 8GB+ for checkpoints ## 📚 Dataset Trained on: [`ash12321/deepfake-v13-dataset`](https://huggingface.co/datasets/ash12321/deepfake-v13-dataset) ## 🔗 Related Models - Predecessor: [`ash12321/deepfake-detector-v12`](https://huggingface.co/ash12321/deepfake-detector-v12) ## 📄 Citation ```bibtex @model{v13-deepfake-detector, title={DeepFake Detector V13: Large-Scale Ensemble}, author={Ash}, year={2024}, publisher={Hugging Face}, howpublished={\url{https://huggingface.co/ash12321/deepfake-detector-v13}} } ``` ## 📝 License MIT License - See LICENSE file for details ## 🙏 Acknowledgments - Built with PyTorch, timm, and Hugging Face - Trained on Google Colab T4 GPU - Architectures: ConvNeXt (Meta), ViT (Google), Swin (Microsoft) --- **Model Version**: 13.0 **Last Updated**: November 2024 **Status**: Production Ready ✅