hamzenium
/

ViT-Deepfake-Classifier

Image Classification

deepfake detection

fake-image detection

Model card Files Files and versions

hamzenium commited on May 9, 2025

Commit

42053c6

·

verified ·

1 Parent(s): c8c46cf

Create README.md

Files changed (1) hide show

README.md +64 -0

README.md ADDED Viewed

	@@ -0,0 +1,64 @@

+---
+license: mit
+language:
+- en
+base_model:
+- google/vit-base-patch16-224
+pipeline_tag: image-classification
+tags:
+- deepfake
+- fakeimages
+- detector
+- fake
+- vision-transformer
+- vit
+- image-classification
+- computer-vision
+- deep-learning
+model_name: ViT Deepfake Detector
+model_creator: Hamza Sohail, Ayaan Mohammed, Shadab Karim, Kirti Dhir
+model_type: vision-transformer
+datasets:
+- faceforensics++
+- celeb-df
+- dfdc
+- custom-generated
+library_name: transformers
+library_version: "4.40.0"
+inference: true
+model_description: |
+  This model is a fine-tuned version of Google's `vit-base-patch16-224` Vision Transformer, trained specifically for the binary classification task of detecting deepfake images. It outputs probabilities indicating whether a given image is real or fake.
+  The model was trained using a combination of real and manipulated images sourced from the FaceForensics++, Celeb-DF, and DFDC datasets, along with additional synthetic samples. It leverages the ViT architecture's ability to capture spatial and contextual features across image patches for effective fake content detection.
+  The primary application of this model is in fake image detection, digital media integrity validation, and social platform moderation tools.
+training_details: |
+  - Base Model: google/vit-base-patch16-224
+  - Epochs: 10
+  - Optimizer: AdamW
+  - Loss: CrossEntropyLoss
+  - Learning rate: 5e-5
+  - Scheduler: CosineAnnealingLR
+  - Batch size: 32
+  - Framework: PyTorch with Hugging Face Transformers
+  - Hardware: Trained using Tesla T4 GPU
+evaluation: |
+  The model was evaluated on a stratified test set of 10,000 images from multiple sources, achieving:
+  - Accuracy: 95.7
+  - Precision: 95.7%
+  - Recall: 95.7%
+  - F1-score: 95.7%
+  Confusion matrix and ROC curves were generated to analyze misclassifications and detection confidence.
+intended_uses: |
+  This model is intended for:
+  - Automated detection of manipulated or deepfake images in social media content.
+  - Research in digital forensics and AI ethics.
+  - Educational purposes for understanding the application of Vision Transformers.
+  **Limitations:** This model may not generalize to unseen manipulation techniques not present in the training datasets. It is not intended for use in real-time legal or security-critical applications without additional verification mechanisms.
+example_usage: