| --- |
| license: apache-2.0 |
| pipeline_tag: image-classification |
| library_name: transformers |
| tags: |
| - deep-fake |
| - ViT |
| - detection |
| - Image |
| - transformers-4.49.0.dev0 |
| - precision-92.12 |
| - v2 |
| base_model: |
| - google/vit-base-patch16-224-in21k |
| --- |
| |
|  |
|
|
| # **Deep-Fake-Detector-v2-Model** |
|
|
| # **Overview** |
|
|
| The **Deep-Fake-Detector-v2-Model** is a state-of-the-art deep learning model designed to detect deepfake images. It leverages the **Vision Transformer (ViT)** architecture, specifically the `google/vit-base-patch16-224-in21k` model, fine-tuned on a dataset of real and deepfake images. The model is trained to classify images as either "Realism" or "Deepfake" with high accuracy, making it a powerful tool for detecting manipulated media. |
|
|
| ``` |
| Classification report: |
| |
| precision recall f1-score support |
| |
| Realism 0.9683 0.8708 0.9170 28001 |
| Deepfake 0.8826 0.9715 0.9249 28000 |
| |
| accuracy 0.9212 56001 |
| macro avg 0.9255 0.9212 0.9210 56001 |
| weighted avg 0.9255 0.9212 0.9210 56001 |
| ``` |
|
|
| **Confusion Matrix**: |
| ``` |
| [[True Positives, False Negatives], |
| [False Positives, True Negatives]] |
| ``` |
| |
|  |
|
|
| **<span style="color:red;">Update :</span>** The previous model checkpoint was obtained using a smaller classification dataset. Although it performed well in evaluation scores, its real-time performance was average due to limited variations in the training set. The new update includes a larger dataset to improve the detection of fake images. |
|
|
| | Repository | Link | |
| |------------|------| |
| | Deep Fake Detector v2 Model | [GitHub Repository](https://github.com/PRITHIVSAKTHIUR/Deep-Fake-Detector-Model) | |
|
|
| # **Key Features** |
| - **Architecture**: Vision Transformer (ViT) - `google/vit-base-patch16-224-in21k`. |
| - **Input**: RGB images resized to 224x224 pixels. |
| - **Output**: Binary classification ("Realism" or "Deepfake"). |
| - **Training Dataset**: A curated dataset of real and deepfake images. |
| - **Fine-Tuning**: The model is fine-tuned using Hugging Face's `Trainer` API with advanced data augmentation techniques. |
| - **Performance**: Achieves high accuracy and F1 score on validation and test datasets. |
|
|
| # **Model Architecture** |
| The model is based on the **Vision Transformer (ViT)**, which treats images as sequences of patches and applies a transformer encoder to learn spatial relationships. Key components include: |
| - **Patch Embedding**: Divides the input image into fixed-size patches (16x16 pixels). |
| - **Transformer Encoder**: Processes patch embeddings using multi-head self-attention mechanisms. |
| - **Classification Head**: A fully connected layer for binary classification. |
|
|
| # **Training Details** |
| - **Optimizer**: AdamW with a learning rate of `1e-6`. |
| - **Batch Size**: 32 for training, 8 for evaluation. |
| - **Epochs**: 2. |
| - **Data Augmentation**: |
| - Random rotation (±90 degrees). |
| - Random sharpness adjustment. |
| - Random resizing and cropping. |
| - **Loss Function**: Cross-Entropy Loss. |
| - **Evaluation Metrics**: Accuracy, F1 Score, and Confusion Matrix. |
|
|
| # **Inference with Hugging Face Pipeline** |
| ```python |
| from transformers import pipeline |
| |
| # Load the model |
| pipe = pipeline('image-classification', model="prithivMLmods/Deep-Fake-Detector-v2-Model", device=0) |
| |
| # Predict on an image |
| result = pipe("path_to_image.jpg") |
| print(result) |
| ``` |
|
|
| # **Inference with PyTorch** |
| ```python |
| from transformers import ViTForImageClassification, ViTImageProcessor |
| from PIL import Image |
| import torch |
| |
| # Load the model and processor |
| model = ViTForImageClassification.from_pretrained("prithivMLmods/Deep-Fake-Detector-v2-Model") |
| processor = ViTImageProcessor.from_pretrained("prithivMLmods/Deep-Fake-Detector-v2-Model") |
| |
| # Load and preprocess the image |
| image = Image.open("path_to_image.jpg").convert("RGB") |
| inputs = processor(images=image, return_tensors="pt") |
| |
| # Perform inference |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| logits = outputs.logits |
| predicted_class = torch.argmax(logits, dim=1).item() |
| |
| # Map class index to label |
| label = model.config.id2label[predicted_class] |
| print(f"Predicted Label: {label}") |
| ``` |
| # **Dataset** |
| The model is fine-tuned on the dataset, which contains: |
| - **Real Images**: Authentic images of human faces. |
| - **Fake Images**: Deepfake images generated using advanced AI techniques. |
|
|
| # **Limitations** |
| The model is trained on a specific dataset and may not generalize well to other deepfake datasets or domains. |
| - Performance may degrade on low-resolution or heavily compressed images. |
| - The model is designed for image classification and does not detect deepfake videos directly. |
|
|
| # **Ethical Considerations** |
|
|
| **Misuse**: This model should not be used for malicious purposes, such as creating or spreading deepfakes. |
| **Bias**: The model may inherit biases from the training dataset. Care should be taken to ensure fairness and inclusivity. |
| **Transparency**: Users should be informed when deepfake detection tools are used to analyze their content. |
|
|
| # **Future Work** |
| - Extend the model to detect deepfake videos. |
| - Improve generalization by training on larger and more diverse datasets. |
| - Incorporate explainability techniques to provide insights into model predictions. |
|
|
| # **Citation** |
|
|
| ```bibtex |
| @misc{Deep-Fake-Detector-v2-Model, |
| author = {prithivMLmods}, |
| title = {Deep-Fake-Detector-v2-Model}, |
| initial = {21 Mar 2024}, |
| second_updated = {31 Jan 2025}, |
| latest_updated = {02 Feb 2025} |
| } |
| |