Update README.md

0d890a3 verified about 2 months ago

5.94 kB

	---
	license: mit
	language:
	- en
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	base_model:
	- google/vit-base-patch16-224-in21k
	library_name: transformers
	tags:
	- deepfake detection
	- fake-image detection
	---
	# ViT Deepfake Detection Model

	## Model Description

	This is a fine-tuned Vision Transformer (ViT) model for binary image classification to detect deepfake images. The model is based on `google/vit-base-patch16-224-in21k` and has been fine-tuned on the OpenForensics dataset to distinguish between real and fake (AI-generated/manipulated) images.

	## Model Details

	- Model Type: Vision Transformer (ViT) for Image Classification
	- Base Model: google/vit-base-patch16-224-in21k
	- Task: Binary Image Classification (Real vs Fake Detection)
	- Language: N/A (Computer Vision)
	- License: Apache 2.0

	## Intended Use

	### Primary Use Cases
	- Detecting AI-generated or manipulated images
	- Content moderation and verification
	- Research in deepfake detection
	- Media authenticity verification

	### Out-of-Scope Use
	- This model should not be used as the sole method for making critical decisions about content authenticity
	- Not intended for surveillance or privacy-invasive applications
	- May not generalize well to deepfake techniques not present in the training data

	## Training Data

	The model was trained on the OpenForensics dataset with the following distribution:

	- Training Set: 16,000 images
	- Validation Set: 2000 images
	- Test Set: 2000 images

	Images were preprocessed and transformed using ViTImageProcessor with standard normalization.

	## Training Procedure

	### Hyperparameters

	```python
	Training Arguments:
	- Batch Size: 24 per device
	- Gradient Accumulation Steps: 1
	- Mixed Precision: FP16
	- Number of Epochs: 10
	- Learning Rate: 3e-5
	- Weight Decay: 0.02
	- Warmup Ratio: 0.08
	- LR Scheduler: Cosine
	- Label Smoothing: 0.05
	- Optimizer: AdamW (default)
	```

	### Training Hardware
	- GPU: Tesla T4
	- Training Time: ~14 minutes for 10 epochs

	### Data Augmentation
	Standard ViT preprocessing with normalization applied via `ViTImageProcessor`.

	## Performance

	### Validation Set Results (Best Epoch - Epoch 5)

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| 96.22% \|
	\| F1 Score \| 96.22% \|
	\| Precision \| 96.30% \|
	\| Recall \| 96.22% \|

	### Test Set Results

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| 96.56% \|

	### Training Progress

	The model showed consistent improvement across epochs:

	\| Epoch \| Training Loss \| Validation Loss \| Accuracy \| F1 Score \|
	\|-------\|---------------\|-----------------\|----------\|----------\|
	\| 1 \| 0.2259 \| 0.2567 \| 92.89% \| 92.88% \|
	\| 2 \| 0.2002 \| 0.2360 \| 93.44% \| 93.43% \|
	\| 3 \| 0.1388 \| 0.1925 \| 96.11% \| 96.11% \|
	\| 4 \| 0.1322 \| 0.2161 \| 95.67% \| 95.67% \|
	\| 5 \| 0.1182 \| 0.2208 \| 96.22% \| 96.22% \|
	\| 6-10 \| 0.1170-0.1171 \| 0.2132-0.2142 \| 95.67-95.78% \| 95.67-95.78% \|

	## Usage

	### Loading the Model

	```python
	from transformers import ViTImageProcessor, ViTForImageClassification
	from PIL import Image
	import torch

	# Load model and processor
	model = ViTForImageClassification.from_pretrained("YOUR_USERNAME/vit-deepfake-detector")
	processor = ViTImageProcessor.from_pretrained("YOUR_USERNAME/vit-deepfake-detector")

	# Load and preprocess image
	image = Image.open("path_to_image.jpg")
	inputs = processor(images=image, return_tensors="pt")

	# Make prediction
	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits
	predicted_class = logits.argmax(-1).item()

	# Get label
	labels = {0: "real", 1: "fake"}
	print(f"Prediction: {labels[predicted_class]}")

	# Get confidence scores
	probabilities = torch.nn.functional.softmax(logits, dim=-1)
	confidence = probabilities[0][predicted_class].item()
	print(f"Confidence: {confidence:.2%}")
	```

	### Batch Prediction

	```python
	from transformers import pipeline

	# Create classification pipeline
	classifier = pipeline("image-classification", model="YOUR_USERNAME/vit-deepfake-detector")

	# Predict on single image
	result = classifier("path_to_image.jpg")
	print(result)

	# Predict on multiple images
	images = ["image1.jpg", "image2.jpg", "image3.jpg"]
	results = classifier(images)
	for img, result in zip(images, results):
	print(f"{img}: {result}")
	```

	## Limitations and Biases

	### Known Limitations
	- Dataset Bias: The model was trained on the OpenForensics dataset, which may not represent all types of deepfakes or manipulation techniques
	- Generalization: Performance may degrade on deepfake generation methods not present in the training data
	- Adversarial Robustness: The model has not been explicitly tested against adversarial attacks
	- Resolution Dependency: Best performance on images around 224x224 pixels (ViT input size)

	### Potential Biases
	- The model's performance may vary across different:
	- Image sources and quality levels
	- Demographic representations in images
	- Types of manipulation techniques
	- Content domains (faces, landscapes, objects, etc.)

	## Ethical Considerations

	- This model should be used responsibly and not for harassment or privacy invasion
	- Decisions based on this model should involve human oversight, especially in high-stakes scenarios
	- Users should be aware that deepfake detection is an evolving field, and no model is perfect
	- False positives and false negatives can have real-world consequences

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{vit-deepfake-detector,
	author = {YOUR_NAME},
	title = {ViT Deepfake Detection Model},
	year = {2024},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/YOUR_USERNAME/vit-deepfake-detector}}
	}
	```
	## Author
	- Dr. Lucy Liu
	- Muhammad Hamza Sohail
	- Ayaan Mohammed
	- Shadab Karim
	- kirti Dhir
	## Disclaimer:
	his model is provided for research and educational purposes. Users are responsible for ensuring compliance with applicable laws and ethical guidelines when deploying this model.