| | --- |
| | license: afl-3.0 |
| | datasets: |
| | - Ruchi2003/Celeb_DF_Frames |
| | language: |
| | - zh |
| | - en |
| | metrics: |
| | - accuracy |
| | - value:Celeb_DF_v1:9312 |
| | - value:DFDCP:8150 |
| | base_model: |
| | - openai/clip-vit-base-patch32 |
| | new_version: openai/clip-vit-base-patch32 |
| | pipeline_tag: image-classification |
| | library_name: adapter-transformers |
| | tags: |
| | - code |
| | --- |
| | # 🚀 CLIP-Enhanced Deepfake Detection with RAG-Inspired Innovations |
| |
|
| | A cutting-edge deepfake detection framework that integrates **CLIP Vision-Language models** with **Parameter-Efficient Fine-Tuning (PEFT)** and **Retrieval-Augmented Generation (RAG)** inspired techniques for enhanced detection performance across multiple benchmark datasets. |
| |
|
| | ## 📊 Overview |
| |
|
| | This repository implements a novel deepfake detection approach that extends the CLIP (Contrastive Language-Image Pre-training) model through several key innovations: |
| |
|
| | 1. **Parameter-Efficient Fine-Tuning (PEFT)** with LoRA (Low-Rank Adaptation) |
| | 2. **Learnable Text Prompts** for adaptive textual representation |
| | 3. **Hard Negative Mining** for improved discriminative learning |
| | 4. **Memory-Augmented Contrastive Learning** (RAG-inspired) |
| | 5. **Dynamic Knowledge-Augmented Text Prompts** (RAG-inspired) |
| |
|
| | The framework achieves state-of-the-art performance on multiple deepfake detection benchmarks while maintaining computational efficiency through selective parameter updates. |
| |
|
| | ## 🎯 Key Features |
| |
|
| | ### 🔧 **Technical Innovations** |
| |
|
| | | Innovation | Description | Key Benefit | |
| | |------------|-------------|-------------| |
| | | **PEFT with LoRA** | Low-rank adaptation of CLIP transformer layers | 90%+ parameter reduction, efficient fine-tuning | |
| | | **Learnable Text Prompts** | Adaptive text feature learning instead of fixed prompts | Dataset-specific textual representations | |
| | | **Hard Negative Mining** | Focus on challenging misclassification cases | Improved discrimination at decision boundaries | |
| | | **Memory-Augmented Contrastive** | RAG-inspired feature retrieval and augmentation | Enhanced generalization through memory | |
| | | **Knowledge-Augmented Prompts** | Dynamic text prompt enhancement with retrieved knowledge | Context-aware textual representations | |
| |
|
| | ### 📈 **Performance Highlights** |
| |
|
| | - **Multi-dataset evaluation** across FaceForensics++, DeepFakeDetection, FaceShifter, and derivatives |
| | - **Dual evaluation metrics**: Frame-level and video-level AUC/AP |
| | - **Efficient training**: Only ~2% of CLIP parameters are trainable |
| | - **Flexible configuration**: YAML-based experiment configuration |
| |
|
| | ## 🛠️ Installation |
| |
|
| | ### Prerequisites |
| |
|
| | - Python 3.8+ |
| | - PyTorch 2.0+ |
| | - CUDA-capable GPU (recommended) |
| |
|
| | ### Dependencies |
| |
|
| | ```bash |
| | # Core dependencies |
| | pip install torch torchvision transformers |
| | |
| | # PEFT for parameter-efficient fine-tuning |
| | pip install peft |
| | |
| | # Additional utilities |
| | pip install scikit-learn tqdm Pillow pyyaml |
| | |
| | # For development |
| | pip install black flake8 mypy |
| | ``` |
| |
|
| | ## 📁 Project Structure |
| |
|
| | ``` |
| | deepfake-detection/ |
| | ├── train_cvpr2025.py # Main training and evaluation script |
| | ├── config/ |
| | │ └── detector/ |
| | │ └── cvpr2025.yaml # Configuration file |
| | ├── checkpoints/ # Saved model weights |
| | ├── datasets/ # Dataset storage (symlinked) |
| | └── results/ # Evaluation results |
| | ``` |
| |
|
| | ## ⚙️ Configuration |
| |
|
| | The system uses YAML configuration for all experiment settings. Key configuration sections: |
| |
|
| | ### Model Configuration |
| | ```yaml |
| | model: |
| | base_model: "CLIP-ViT-B-32" # or "CLIP-ViT-L-14" |
| | use_peft: true |
| | lora_rank: 16 |
| | lora_alpha: 16 |
| | lora_dropout: 0.1 |
| | ``` |
| |
|
| | ### Training Configuration |
| | ```yaml |
| | training: |
| | nEpochs: 50 |
| | batch_size: 32 |
| | optimizer: "adam" |
| | learning_rate: 1e-4 |
| | temperature: 0.07 # Contrastive learning temperature |
| | ``` |
| |
|
| | ### Innovation Toggles |
| | ```yaml |
| | innovations: |
| | use_learnable_prompts: true |
| | use_hard_mining: true |
| | use_memory_augmented: true |
| | use_knowledge_augmented_prompts: true |
| | ``` |
| |
|
| | ## 🚀 Quick Start |
| |
|
| | ### 1. Training from Scratch |
| |
|
| | ```bash |
| | # Basic training with default configuration |
| | python train_cvpr2025.py |
| | |
| | # With custom configuration |
| | python train_cvpr2025.py --config path/to/custom_config.yaml |
| | |
| | # Specify experiment name |
| | python train_cvpr2025.py --experiment_name "ff++_lora_experiment" |
| | ``` |
| |
|
| | ### 2. Evaluating Pre-trained Models |
| |
|
| | ```bash |
| | # Evaluate a saved checkpoint |
| | python -c "from train_cvpr2025 import test_with_loaded_weights; test_with_loaded_weights('checkpoints/best_lora_weights.pth')" |
| | |
| | # With custom config |
| | python -c "from train_cvpr2025 import test_with_loaded_weights; test_with_loaded_weights('checkpoints/best.pth', 'config/custom.yaml')" |
| | ``` |
| |
|
| | ### 3. Custom Dataset Integration |
| |
|
| | To add a new dataset: |
| |
|
| | 1. Create dataset JSON file in the expected format |
| | 2. Update configuration with dataset paths |
| | 3. Add dataset to `train_dataset` or `test_dataset` lists |
| |
|
| | ## 🔬 Technical Details |
| |
|
| | ### LoRA Implementation |
| |
|
| | The system uses PEFT's LoRA implementation for efficient fine-tuning: |
| |
|
| | ```python |
| | from peft import LoraConfig, get_peft_model |
| | |
| | lora_config = LoraConfig( |
| | r=16, # LoRA rank |
| | lora_alpha=16, |
| | target_modules=["q_proj", "v_proj"], # Attention layers to adapt |
| | lora_dropout=0.1, |
| | bias="none", |
| | task_type=TaskType.FEATURE_EXTRACTION, |
| | ) |
| | |
| | model = get_peft_model(clip_model, lora_config) |
| | ``` |
| |
|
| | ### Memory-Augmented Contrastive Learning |
| |
|
| | Inspired by RAG, this component retrieves similar features from memory banks to augment contrastive learning: |
| |
|
| | ```python |
| | class MemoryBank: |
| | def retrieve(self, query_feat, k=5): |
| | # Retrieve k most similar features |
| | similarities = query_feat @ self.memory.t() |
| | _, indices = torch.topk(similarities, k) |
| | return self.memory[indices] |
| | ``` |
| |
|
| | ### Dynamic Knowledge-Augmented Prompts |
| |
|
| | Text prompts are dynamically enhanced with retrieved knowledge from training: |
| |
|
| | ```python |
| | class KnowledgeAugmentedTextPrompts: |
| | def forward(self, img_feat): |
| | # Retrieve relevant knowledge |
| | real_knowledge, fake_knowledge = self.knowledge_bank.retrieve(img_feat) |
| | |
| | # Augment base prompts with retrieved knowledge |
| | enhanced_real = self.fusion(base_real_prompt, real_knowledge) |
| | enhanced_fake = self.fusion(base_fake_prompt, fake_knowledge) |
| | |
| | return enhanced_real, enhanced_fake |
| | ``` |
| |
|
| | ## 📊 Evaluation Metrics |
| |
|
| | The system provides comprehensive evaluation: |
| |
|
| | ### Frame-Level Metrics |
| | - **AUC**: Area Under ROC Curve |
| | - **AP**: Average Precision |
| |
|
| | ### Video-Level Metrics |
| | - **Video AUC**: Aggregated frame predictions per video |
| | - **Video AP**: Precision-recall at video level |
| |
|
| | ### Dataset Coverage |
| | - FaceForensics++ (c23, c40 compressions) |
| | - DeepFakeDetection |
| | - FaceShifter |
| | - FF-DF, FF-F2F, FF-FS, FF-NT subsets |
| |
|
| | ## 🎨 Visualization Features |
| |
|
| | The training script includes progress tracking: |
| |
|
| | ```python |
| | # Training progress with tqdm |
| | for images, labels in tqdm(train_loader, desc=f"Epoch {epoch}"): |
| | # Training loop |
| | pass |
| | |
| | # Real-time metrics display |
| | print(f"[Eval] {dataset_name}: AUC={auc:.4f} AP={ap:.4f}") |
| | ``` |
| |
|
| | ## 📈 Performance Optimization |
| |
|
| | ### Memory Efficiency |
| | - Gradient checkpointing for large batches |
| | - Mixed precision training (FP16) |
| | - Efficient data loading with multiple workers |
| |
|
| | ### Speed Optimizations |
| | - Pre-computed text feature caching |
| | - Batch-wise retrieval operations |
| | - Optimized data augmentation pipelines |
| |
|
| | ## 🔍 Debugging and Logging |
| |
|
| | Comprehensive logging is built-in: |
| |
|
| | ```python |
| | # File existence checks |
| | if not os.path.exists(full_path): |
| | print(f"[Warning] Image not found: {full_path}") |
| | |
| | # Memory bank statistics |
| | print(f"[Memory] Real samples: {real_size}, Fake samples: {fake_size}") |
| | |
| | # Training progress |
| | print(f"[Train] Epoch {epoch}: loss={loss:.4f}, lr={lr:.6f}") |
| | ``` |
| |
|
| | ## 📚 Citation |
| |
|
| | If you use this code in your research, please cite: |
| |
|
| | ```bibtex |
| | @article{deepfake2025clip, |
| | title={CLIP-Enhanced Deepfake Detection with RAG-Inspired Memory Augmentation}, |
| | author={Your Name}, |
| | journal={CVPR}, |
| | year={2025} |
| | } |
| | ``` |
| |
|
| | ## 🤝 Contributing |
| |
|
| | We welcome contributions! Please: |
| |
|
| | 1. Fork the repository |
| | 2. Create a feature branch |
| | 3. Add tests for new functionality |
| | 4. Submit a pull request |
| |
|
| | ### Code Style |
| | - Follow PEP 8 guidelines |
| | - Use type hints where possible |
| | - Document new functions with docstrings |
| |
|
| | ## 📄 License |
| |
|
| | This project is licensed under the MIT License - see the LICENSE file for details. |
| |
|
| | ## 🙏 Acknowledgments |
| |
|
| | - OpenAI for the CLIP model |
| | - Hugging Face for Transformers and PEFT libraries |
| | - The DeepfakeBench team for benchmark datasets |
| | - All contributors and researchers in the deepfake detection field |
| |
|
| | ## 📞 Contact |
| |
|
| | For questions, issues, or collaborations: |
| |
|
| | - **Issues**: [GitHub Issues](https://github.com/yourrepo/issues) |
| | - **Email**: your.email@institution.edu |
| | - **Discussion**: [GitHub Discussions](https://github.com/yourrepo/discussions) |
| |
|
| | --- |
| |
|
| | **Note**: This implementation is research-oriented and may require adjustments for production deployment. Always validate performance on your specific use case and datasets. |
| |
|
| | ## 🔄 Updates and Maintenance |
| |
|
| | - **Last Updated**: January 2025 |
| | - **Compatible with**: PyTorch 2.0+, Transformers 4.30+ |
| | - **Tested on**: NVIDIA A100, V100, RTX 3090 GPUs |
| |
|
| | For the latest updates and bug fixes, check the [Releases](https://github.com/yourrepo/releases) page. |