hamzenium commited on
Commit
bc48c15
·
verified ·
1 Parent(s): 739e241

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +181 -39
README.md CHANGED
@@ -1,39 +1,181 @@
1
- ---
2
- ---
3
- license: mit
4
- language:
5
- - en
6
- base_model:
7
- - google/vit-base-patch16-224
8
- pipeline_tag: image-classification
9
- model_name: ViT Deepfake Detector
10
- model_creator: Hamza Sohail, Ayaan Mohammed, Shadab Karim, Kirti Dhir
11
- model_type: vision-transformer
12
- library_name: transformers
13
- library_version: "4.40.0"
14
- inference: true
15
- model_description: |
16
- A fine-tuned Vision Transformer (`vit-base-patch16-224`) for classifying real vs. fake images. Trained on FaceForensics++, Celeb-DF, DFDC, and custom samples. Outputs real/fake probabilities for input images.
17
-
18
- training_details: |
19
- - Epochs: 10
20
- - Optimizer: AdamW
21
- - Loss: CrossEntropy
22
- - LR: 5e-5
23
- - Batch size: 32
24
- - GPU: Tesla T4
25
-
26
- evaluation: |
27
- Evaluated on 10,000 images:
28
- - Accuracy: 95.7%
29
- - Precision/Recall/F1: 95.7%
30
-
31
- intended_uses: |
32
- For fake image detection in research, moderation, and education. Not for legal/critical decisions without further verification.
33
- tags:
34
- - deepfake
35
- - fakeimages
36
- - detector
37
- - vit
38
- - computer-vision
39
- - deep-learning
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ViT Deepfake Detection Model
2
+
3
+ ## Model Description
4
+
5
+ This is a fine-tuned Vision Transformer (ViT) model for binary image classification to detect deepfake images. The model is based on `google/vit-base-patch16-224-in21k` and has been fine-tuned on the OpenForensics dataset to distinguish between real and fake (AI-generated/manipulated) images.
6
+
7
+ ## Model Details
8
+
9
+ - **Model Type:** Vision Transformer (ViT) for Image Classification
10
+ - **Base Model:** google/vit-base-patch16-224-in21k
11
+ - **Task:** Binary Image Classification (Real vs Fake Detection)
12
+ - **Language:** N/A (Computer Vision)
13
+ - **License:** Apache 2.0
14
+
15
+ ## Intended Use
16
+
17
+ ### Primary Use Cases
18
+ - Detecting AI-generated or manipulated images
19
+ - Content moderation and verification
20
+ - Research in deepfake detection
21
+ - Media authenticity verification
22
+
23
+ ### Out-of-Scope Use
24
+ - This model should not be used as the sole method for making critical decisions about content authenticity
25
+ - Not intended for surveillance or privacy-invasive applications
26
+ - May not generalize well to deepfake techniques not present in the training data
27
+
28
+ ## Training Data
29
+
30
+ The model was trained on the **OpenForensics dataset** with the following distribution:
31
+
32
+ - **Training Set:** 16,000 images
33
+ - **Validation Set:** 2000 images
34
+ - **Test Set:** 2000 images
35
+
36
+ Images were preprocessed and transformed using ViTImageProcessor with standard normalization.
37
+
38
+ ## Training Procedure
39
+
40
+ ### Hyperparameters
41
+
42
+ ```python
43
+ Training Arguments:
44
+ - Batch Size: 24 per device
45
+ - Gradient Accumulation Steps: 1
46
+ - Mixed Precision: FP16
47
+ - Number of Epochs: 10
48
+ - Learning Rate: 3e-5
49
+ - Weight Decay: 0.02
50
+ - Warmup Ratio: 0.08
51
+ - LR Scheduler: Cosine
52
+ - Label Smoothing: 0.05
53
+ - Optimizer: AdamW (default)
54
+ ```
55
+
56
+ ### Training Hardware
57
+ - GPU: Tesla T4
58
+ - Training Time: ~14 minutes for 10 epochs
59
+
60
+ ### Data Augmentation
61
+ Standard ViT preprocessing with normalization applied via `ViTImageProcessor`.
62
+
63
+ ## Performance
64
+
65
+ ### Validation Set Results (Best Epoch - Epoch 5)
66
+
67
+ | Metric | Score |
68
+ |--------|-------|
69
+ | Accuracy | 96.22% |
70
+ | F1 Score | 96.22% |
71
+ | Precision | 96.30% |
72
+ | Recall | 96.22% |
73
+
74
+ ### Test Set Results
75
+
76
+ | Metric | Score |
77
+ |--------|-------|
78
+ | Accuracy | **96.56%** |
79
+
80
+ ### Training Progress
81
+
82
+ The model showed consistent improvement across epochs:
83
+
84
+ | Epoch | Training Loss | Validation Loss | Accuracy | F1 Score |
85
+ |-------|---------------|-----------------|----------|----------|
86
+ | 1 | 0.2259 | 0.2567 | 92.89% | 92.88% |
87
+ | 2 | 0.2002 | 0.2360 | 93.44% | 93.43% |
88
+ | 3 | 0.1388 | 0.1925 | 96.11% | 96.11% |
89
+ | 4 | 0.1322 | 0.2161 | 95.67% | 95.67% |
90
+ | 5 | 0.1182 | 0.2208 | **96.22%** | **96.22%** |
91
+ | 6-10 | 0.1170-0.1171 | 0.2132-0.2142 | 95.67-95.78% | 95.67-95.78% |
92
+
93
+ ## Usage
94
+
95
+ ### Loading the Model
96
+
97
+ ```python
98
+ from transformers import ViTImageProcessor, ViTForImageClassification
99
+ from PIL import Image
100
+ import torch
101
+
102
+ # Load model and processor
103
+ model = ViTForImageClassification.from_pretrained("YOUR_USERNAME/vit-deepfake-detector")
104
+ processor = ViTImageProcessor.from_pretrained("YOUR_USERNAME/vit-deepfake-detector")
105
+
106
+ # Load and preprocess image
107
+ image = Image.open("path_to_image.jpg")
108
+ inputs = processor(images=image, return_tensors="pt")
109
+
110
+ # Make prediction
111
+ with torch.no_grad():
112
+ outputs = model(**inputs)
113
+ logits = outputs.logits
114
+ predicted_class = logits.argmax(-1).item()
115
+
116
+ # Get label
117
+ labels = {0: "real", 1: "fake"}
118
+ print(f"Prediction: {labels[predicted_class]}")
119
+
120
+ # Get confidence scores
121
+ probabilities = torch.nn.functional.softmax(logits, dim=-1)
122
+ confidence = probabilities[0][predicted_class].item()
123
+ print(f"Confidence: {confidence:.2%}")
124
+ ```
125
+
126
+ ### Batch Prediction
127
+
128
+ ```python
129
+ from transformers import pipeline
130
+
131
+ # Create classification pipeline
132
+ classifier = pipeline("image-classification", model="YOUR_USERNAME/vit-deepfake-detector")
133
+
134
+ # Predict on single image
135
+ result = classifier("path_to_image.jpg")
136
+ print(result)
137
+
138
+ # Predict on multiple images
139
+ images = ["image1.jpg", "image2.jpg", "image3.jpg"]
140
+ results = classifier(images)
141
+ for img, result in zip(images, results):
142
+ print(f"{img}: {result}")
143
+ ```
144
+
145
+ ## Limitations and Biases
146
+
147
+ ### Known Limitations
148
+ - **Dataset Bias:** The model was trained on the OpenForensics dataset, which may not represent all types of deepfakes or manipulation techniques
149
+ - **Generalization:** Performance may degrade on deepfake generation methods not present in the training data
150
+ - **Adversarial Robustness:** The model has not been explicitly tested against adversarial attacks
151
+ - **Resolution Dependency:** Best performance on images around 224x224 pixels (ViT input size)
152
+
153
+ ### Potential Biases
154
+ - The model's performance may vary across different:
155
+ - Image sources and quality levels
156
+ - Demographic representations in images
157
+ - Types of manipulation techniques
158
+ - Content domains (faces, landscapes, objects, etc.)
159
+
160
+ ## Ethical Considerations
161
+
162
+ - This model should be used responsibly and not for harassment or privacy invasion
163
+ - Decisions based on this model should involve human oversight, especially in high-stakes scenarios
164
+ - Users should be aware that deepfake detection is an evolving field, and no model is perfect
165
+ - False positives and false negatives can have real-world consequences
166
+
167
+ ## Citation
168
+
169
+ If you use this model, please cite:
170
+
171
+ ```bibtex
172
+ @misc{vit-deepfake-detector,
173
+ author = {YOUR_NAME},
174
+ title = {ViT Deepfake Detection Model},
175
+ year = {2024},
176
+ publisher = {HuggingFace},
177
+ howpublished = {\url{https://huggingface.co/YOUR_USERNAME/vit-deepfake-detector}}
178
+ }
179
+ ```
180
+
181
+ **Disclaimer:** This model is provided for research and educational purposes. Users are responsible for ensuring compliance with applicable laws and ethical guidelines when deploying this model.