A newer version of this model is available: openai/clip-vit-base-patch32

🚀 CLIP-Enhanced Deepfake Detection with RAG-Inspired Innovations

A cutting-edge deepfake detection framework that integrates CLIP Vision-Language models with Parameter-Efficient Fine-Tuning (PEFT) and Retrieval-Augmented Generation (RAG) inspired techniques for enhanced detection performance across multiple benchmark datasets.

📊 Overview

This repository implements a novel deepfake detection approach that extends the CLIP (Contrastive Language-Image Pre-training) model through several key innovations:

Parameter-Efficient Fine-Tuning (PEFT) with LoRA (Low-Rank Adaptation)
Learnable Text Prompts for adaptive textual representation
Hard Negative Mining for improved discriminative learning
Memory-Augmented Contrastive Learning (RAG-inspired)
Dynamic Knowledge-Augmented Text Prompts (RAG-inspired)

The framework achieves state-of-the-art performance on multiple deepfake detection benchmarks while maintaining computational efficiency through selective parameter updates.

🎯 Key Features

🔧 Technical Innovations

Innovation	Description	Key Benefit
PEFT with LoRA	Low-rank adaptation of CLIP transformer layers	90%+ parameter reduction, efficient fine-tuning
Learnable Text Prompts	Adaptive text feature learning instead of fixed prompts	Dataset-specific textual representations
Hard Negative Mining	Focus on challenging misclassification cases	Improved discrimination at decision boundaries
Memory-Augmented Contrastive	RAG-inspired feature retrieval and augmentation	Enhanced generalization through memory
Knowledge-Augmented Prompts	Dynamic text prompt enhancement with retrieved knowledge	Context-aware textual representations

📈 Performance Highlights

Multi-dataset evaluation across FaceForensics++, DeepFakeDetection, FaceShifter, and derivatives
Dual evaluation metrics: Frame-level and video-level AUC/AP
Efficient training: Only ~2% of CLIP parameters are trainable
Flexible configuration: YAML-based experiment configuration

🛠️ Installation

Prerequisites

Python 3.8+
PyTorch 2.0+
CUDA-capable GPU (recommended)

Dependencies

# Core dependencies
pip install torch torchvision transformers

# PEFT for parameter-efficient fine-tuning
pip install peft

# Additional utilities
pip install scikit-learn tqdm Pillow pyyaml

# For development
pip install black flake8 mypy

📁 Project Structure

deepfake-detection/
├── train_cvpr2025.py          # Main training and evaluation script
├── config/
│   └── detector/
│       └── cvpr2025.yaml      # Configuration file
├── checkpoints/               # Saved model weights
├── datasets/                  # Dataset storage (symlinked)
└── results/                   # Evaluation results

⚙️ Configuration

The system uses YAML configuration for all experiment settings. Key configuration sections:

Model Configuration

model:
  base_model: "CLIP-ViT-B-32"  # or "CLIP-ViT-L-14"
  use_peft: true
  lora_rank: 16
  lora_alpha: 16
  lora_dropout: 0.1

Training Configuration

training:
  nEpochs: 50
  batch_size: 32
  optimizer: "adam"
  learning_rate: 1e-4
  temperature: 0.07  # Contrastive learning temperature

Innovation Toggles

innovations:
  use_learnable_prompts: true
  use_hard_mining: true
  use_memory_augmented: true
  use_knowledge_augmented_prompts: true

🚀 Quick Start

1. Training from Scratch

# Basic training with default configuration
python train_cvpr2025.py

# With custom configuration
python train_cvpr2025.py --config path/to/custom_config.yaml

# Specify experiment name
python train_cvpr2025.py --experiment_name "ff++_lora_experiment"

2. Evaluating Pre-trained Models

# Evaluate a saved checkpoint
python -c "from train_cvpr2025 import test_with_loaded_weights; test_with_loaded_weights('checkpoints/best_lora_weights.pth')"

# With custom config
python -c "from train_cvpr2025 import test_with_loaded_weights; test_with_loaded_weights('checkpoints/best.pth', 'config/custom.yaml')"

3. Custom Dataset Integration

To add a new dataset:

Create dataset JSON file in the expected format
Update configuration with dataset paths
Add dataset to train_dataset or test_dataset lists

🔬 Technical Details

LoRA Implementation

The system uses PEFT's LoRA implementation for efficient fine-tuning:

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,  # LoRA rank
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],  # Attention layers to adapt
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.FEATURE_EXTRACTION,
)

model = get_peft_model(clip_model, lora_config)

Memory-Augmented Contrastive Learning

Inspired by RAG, this component retrieves similar features from memory banks to augment contrastive learning:

class MemoryBank:
    def retrieve(self, query_feat, k=5):
        # Retrieve k most similar features
        similarities = query_feat @ self.memory.t()
        _, indices = torch.topk(similarities, k)
        return self.memory[indices]

Dynamic Knowledge-Augmented Prompts

Text prompts are dynamically enhanced with retrieved knowledge from training:

class KnowledgeAugmentedTextPrompts:
    def forward(self, img_feat):
        # Retrieve relevant knowledge
        real_knowledge, fake_knowledge = self.knowledge_bank.retrieve(img_feat)
        
        # Augment base prompts with retrieved knowledge
        enhanced_real = self.fusion(base_real_prompt, real_knowledge)
        enhanced_fake = self.fusion(base_fake_prompt, fake_knowledge)
        
        return enhanced_real, enhanced_fake

📊 Evaluation Metrics

The system provides comprehensive evaluation:

Frame-Level Metrics

AUC: Area Under ROC Curve
AP: Average Precision

Video-Level Metrics

Video AUC: Aggregated frame predictions per video
Video AP: Precision-recall at video level

Dataset Coverage

FaceForensics++ (c23, c40 compressions)
DeepFakeDetection
FaceShifter
FF-DF, FF-F2F, FF-FS, FF-NT subsets

🎨 Visualization Features

The training script includes progress tracking:

# Training progress with tqdm
for images, labels in tqdm(train_loader, desc=f"Epoch {epoch}"):
    # Training loop
    pass

# Real-time metrics display
print(f"[Eval] {dataset_name}: AUC={auc:.4f} AP={ap:.4f}")

📈 Performance Optimization

Memory Efficiency

Gradient checkpointing for large batches
Mixed precision training (FP16)
Efficient data loading with multiple workers

Speed Optimizations

Pre-computed text feature caching
Batch-wise retrieval operations
Optimized data augmentation pipelines

🔍 Debugging and Logging

Comprehensive logging is built-in:

# File existence checks
if not os.path.exists(full_path):
    print(f"[Warning] Image not found: {full_path}")
    
# Memory bank statistics
print(f"[Memory] Real samples: {real_size}, Fake samples: {fake_size}")

# Training progress
print(f"[Train] Epoch {epoch}: loss={loss:.4f}, lr={lr:.6f}")

📚 Citation

If you use this code in your research, please cite:

@article{deepfake2025clip,
  title={CLIP-Enhanced Deepfake Detection with RAG-Inspired Memory Augmentation},
  author={Your Name},
  journal={CVPR},
  year={2025}
}

🤝 Contributing

We welcome contributions! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

Code Style

Follow PEP 8 guidelines
Use type hints where possible
Document new functions with docstrings

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI for the CLIP model
Hugging Face for Transformers and PEFT libraries
The DeepfakeBench team for benchmark datasets
All contributors and researchers in the deepfake detection field

📞 Contact

For questions, issues, or collaborations:

Issues: GitHub Issues
Email: your.email@institution.edu
Discussion: GitHub Discussions

Note: This implementation is research-oriented and may require adjustments for production deployment. Always validate performance on your specific use case and datasets.

🔄 Updates and Maintenance

Last Updated: January 2025
Compatible with: PyTorch 2.0+, Transformers 4.30+
Tested on: NVIDIA A100, V100, RTX 3090 GPUs

For the latest updates and bug fixes, check the Releases page.

Downloads last month: -

Model tree for feqhwjBBA/Deepfake-CLIP

Base model

openai/clip-vit-base-patch32

Adapter

(27)

this model

feqhwjBBA
/

Deepfake-CLIP