hamzenium
/

ViT-Deepfake-Classifier

@@ -1,39 +1,181 @@
----
----
-license: mit
-language:
-- en
-base_model:
-- google/vit-base-patch16-224
-pipeline_tag: image-classification
-model_name: ViT Deepfake Detector
-model_creator: Hamza Sohail, Ayaan Mohammed, Shadab Karim, Kirti Dhir
-model_type: vision-transformer
-library_name: transformers
-library_version: "4.40.0"
-inference: true
-model_description: |
-  A fine-tuned Vision Transformer (`vit-base-patch16-224`) for classifying real vs. fake images. Trained on FaceForensics++, Celeb-DF, DFDC, and custom samples. Outputs real/fake probabilities for input images.
-training_details: |
-  - Epochs: 10
-  - Optimizer: AdamW
-  - Loss: CrossEntropy
-  - LR: 5e-5
-  - Batch size: 32
-  - GPU: Tesla T4
-evaluation: |
-  Evaluated on 10,000 images:
-  - Accuracy: 95.7%
-  - Precision/Recall/F1: 95.7%
-intended_uses: |
-  For fake image detection in research, moderation, and education. Not for legal/critical decisions without further verification.
-tags:
-- deepfake
-- fakeimages
-- detector
-- vit
-- computer-vision
-- deep-learning

+# ViT Deepfake Detection Model
+## Model Description
+This is a fine-tuned Vision Transformer (ViT) model for binary image classification to detect deepfake images. The model is based on `google/vit-base-patch16-224-in21k` and has been fine-tuned on the OpenForensics dataset to distinguish between real and fake (AI-generated/manipulated) images.
+## Model Details
+- **Model Type:** Vision Transformer (ViT) for Image Classification
+- **Base Model:** google/vit-base-patch16-224-in21k
+- **Task:** Binary Image Classification (Real vs Fake Detection)
+- **Language:** N/A (Computer Vision)
+- **License:** Apache 2.0
+## Intended Use
+### Primary Use Cases
+- Detecting AI-generated or manipulated images
+- Content moderation and verification
+- Research in deepfake detection
+- Media authenticity verification
+### Out-of-Scope Use
+- This model should not be used as the sole method for making critical decisions about content authenticity
+- Not intended for surveillance or privacy-invasive applications
+- May not generalize well to deepfake techniques not present in the training data
+## Training Data
+The model was trained on the **OpenForensics dataset** with the following distribution:
+- **Training Set:** 16,000 images
+- **Validation Set:** 2000 images
+- **Test Set:** 2000 images
+Images were preprocessed and transformed using ViTImageProcessor with standard normalization.
+## Training Procedure
+### Hyperparameters
+```python
+Training Arguments:
+- Batch Size: 24 per device
+- Gradient Accumulation Steps: 1
+- Mixed Precision: FP16
+- Number of Epochs: 10
+- Learning Rate: 3e-5
+- Weight Decay: 0.02
+- Warmup Ratio: 0.08
+- LR Scheduler: Cosine
+- Label Smoothing: 0.05
+- Optimizer: AdamW (default)
+```
+### Training Hardware
+- GPU: Tesla T4
+- Training Time: ~14 minutes for 10 epochs
+### Data Augmentation
+Standard ViT preprocessing with normalization applied via `ViTImageProcessor`.
+## Performance
+### Validation Set Results (Best Epoch - Epoch 5)
+| Metric | Score |
+|--------|-------|
+| Accuracy | 96.22% |
+| F1 Score | 96.22% |
+| Precision | 96.30% |
+| Recall | 96.22% |
+### Test Set Results
+| Metric | Score |
+|--------|-------|
+| Accuracy | **96.56%** |
+### Training Progress
+The model showed consistent improvement across epochs:
+| Epoch | Training Loss | Validation Loss | Accuracy | F1 Score |
+|-------|---------------|-----------------|----------|----------|
+| 1 | 0.2259 | 0.2567 | 92.89% | 92.88% |
+| 2 | 0.2002 | 0.2360 | 93.44% | 93.43% |
+| 3 | 0.1388 | 0.1925 | 96.11% | 96.11% |
+| 4 | 0.1322 | 0.2161 | 95.67% | 95.67% |
+| 5 | 0.1182 | 0.2208 | **96.22%** | **96.22%** |
+| 6-10 | 0.1170-0.1171 | 0.2132-0.2142 | 95.67-95.78% | 95.67-95.78% |
+## Usage
+### Loading the Model
+```python
+from transformers import ViTImageProcessor, ViTForImageClassification
+from PIL import Image
+import torch
+# Load model and processor
+model = ViTForImageClassification.from_pretrained("YOUR_USERNAME/vit-deepfake-detector")
+processor = ViTImageProcessor.from_pretrained("YOUR_USERNAME/vit-deepfake-detector")
+# Load and preprocess image
+image = Image.open("path_to_image.jpg")
+inputs = processor(images=image, return_tensors="pt")
+# Make prediction
+with torch.no_grad():
+    outputs = model(**inputs)
+    logits = outputs.logits
+    predicted_class = logits.argmax(-1).item()
+# Get label
+labels = {0: "real", 1: "fake"}
+print(f"Prediction: {labels[predicted_class]}")
+# Get confidence scores
+probabilities = torch.nn.functional.softmax(logits, dim=-1)
+confidence = probabilities[0][predicted_class].item()
+print(f"Confidence: {confidence:.2%}")
+```
+### Batch Prediction
+```python
+from transformers import pipeline
+# Create classification pipeline
+classifier = pipeline("image-classification", model="YOUR_USERNAME/vit-deepfake-detector")
+# Predict on single image
+result = classifier("path_to_image.jpg")
+print(result)
+# Predict on multiple images
+images = ["image1.jpg", "image2.jpg", "image3.jpg"]
+results = classifier(images)
+for img, result in zip(images, results):
+    print(f"{img}: {result}")
+```
+## Limitations and Biases
+### Known Limitations
+- **Dataset Bias:** The model was trained on the OpenForensics dataset, which may not represent all types of deepfakes or manipulation techniques
+- **Generalization:** Performance may degrade on deepfake generation methods not present in the training data
+- **Adversarial Robustness:** The model has not been explicitly tested against adversarial attacks
+- **Resolution Dependency:** Best performance on images around 224x224 pixels (ViT input size)
+### Potential Biases
+- The model's performance may vary across different:
+  - Image sources and quality levels
+  - Demographic representations in images
+  - Types of manipulation techniques
+  - Content domains (faces, landscapes, objects, etc.)
+## Ethical Considerations
+- This model should be used responsibly and not for harassment or privacy invasion
+- Decisions based on this model should involve human oversight, especially in high-stakes scenarios
+- Users should be aware that deepfake detection is an evolving field, and no model is perfect
+- False positives and false negatives can have real-world consequences
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{vit-deepfake-detector,
+  author = {YOUR_NAME},
+  title = {ViT Deepfake Detection Model},
+  year = {2024},
+  publisher = {HuggingFace},
+  howpublished = {\url{https://huggingface.co/YOUR_USERNAME/vit-deepfake-detector}}
+}
+```
+**Disclaimer:** This model is provided for research and educational purposes. Users are responsible for ensuring compliance with applicable laws and ethical guidelines when deploying this model.