pranav2711 commited on
Commit
b6c2aab
·
verified ·
1 Parent(s): d54ebb8

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -86
README.md DELETED
@@ -1,86 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- ## Vision Transformer (ViT) Models for Digital Forensics
5
-
6
- This repository provides Vision Transformer (ViT) models fine-tuned to detect manipulated (fake) versus authentic (real) image frames extracted from the FaceForensics++ dataset. The models were trained using Intel® ARC GPU (XPU-enabled) and optimized for binary image classification in digital forensics workflows.
7
-
8
- ---
9
-
10
- ## 🧠 Models Trained
11
-
12
- | Model Name | Pretrained On | Patch Size | Parameters |
13
- |---------------------------|-----------------------|------------|------------|
14
- | `vit_tiny_patch16_224` | ImageNet-21k | 16×16 | ~5.7M |
15
- | `vit_tiny_patch32_224` | ImageNet-21k | 32×32 | ~5.6M |
16
- | `vit_small_patch32_224` | AugReg + IN21k + IN1k | 32×32 | ~22M |
17
- | `vit_large_patch16_224` | ImageNet-21k | 16×16 | ~304M |
18
- | `vit_large_patch32_224` | ImageNet-21k | 32×32 | ~304M |
19
-
20
- ---
21
-
22
- ## 🗂️ Dataset
23
-
24
- - **Name**: DeepFake Detection (DFD)
25
- - **Source**: [Kaggle DFD Dataset]
26
- - **Classes**: `real`, `fake`
27
- - **Input**: Extracted video frames resized to 224×224 RGB images
28
- - **Preprocessing**:
29
- - Resizing and normalization using `torchvision.transforms`
30
- - Structured into `train/real`, `train/fake`, `val/real`, `val/fake`
31
-
32
- ---
33
-
34
- ## ⚙️ Hardware & Environment
35
-
36
- - **Accelerator**: Intel® ARC GPU (XPU via Intel Extension for PyTorch)
37
- - **Frameworks**:
38
- - PyTorch 2.7.0 + XPU backend
39
- - torchvision 0.22.0
40
- - timm for pretrained ViT models
41
- - **OS**: Windows 11
42
- - **Memory Consideration**: `vit_huge_patch14_224` requires large GPU memory; tested on Intel ARC A770 16GB and NPU Boost
43
-
44
- ---
45
-
46
- ## ✅ Use Case: Deepfake Frame Detection
47
-
48
- These models are designed to identify manipulated media content at the frame level. Use cases include:
49
-
50
- - 🔍 Video forensics
51
- - 🎞️ Deepfake screening and flagging pipelines
52
- - 🧪 Data validation for machine learning datasets
53
- - 📡 Real-time frame-level media authentication
54
-
55
- They are well-suited for deployment in digital forensics, content moderation, and research scenarios where image authenticity is critical.
56
-
57
- ---
58
-
59
- ## 📊 Results
60
-
61
- | Model | Train Accuracy | Validation Accuracy |
62
- |-------|----------------|---------------------|
63
- | [vit_large_patch16_224](https://huggingface.co/pranav2711/VisionTransformerDigitalForensics/blob/main/vit_large_patch16_224.pth) | **94.89%** | **91.22%** |
64
- | [vit_large_patch32_224](https://huggingface.co/pranav2711/VisionTransformerDigitalForensics/blob/main/vit_large_patch32_224.pth) | 91.31% | 89.23% |
65
- | [vit_tiny_patch16_224](https://huggingface.co/pranav2711/VisionTransformerDigitalForensics/blob/main/vit_tiny_patch16_224.pth) | 92.41% | 89.20% |
66
- | [vit_small_patch32_224](https://huggingface.co/pranav2711/VisionTransformerDigitalForensics/blob/main/vit_small_patch32_224.pth) | 91.38% | 88.29% |
67
- | [vit_small_patch16_224](https://huggingface.co/pranav2711/VisionTransformerDigitalForensics/blob/main/vit_small_patch32_224.pth) | 80.67% | 81.25% |
68
- | [vit_base_patch16_224](http://huggingface.co/pranav2711/VisionTransformerDigitalForensics/blob/main/vit_base_patch16_224.pth) | 90.65% | 85.36% |
69
- | [vit_base_patch32_224](https://huggingface.co/pranav2711/VisionTransformerDigitalForensics/blob/main/vit_base_patch32_224.pth) | 79.54% | 79.54% |
70
-
71
- **vit_large_patch16_224**
72
- This model achieved the highest validation accuracy of 91.22% with strong training stability and generalization. It is recommended as the final model for deployment or downstream tasks.
73
-
74
- ---
75
-
76
- ## 📄 License
77
-
78
- This model is licensed under the [CreativeML OpenRAIL-M License](https://huggingface.co/spaces/CompVis/stable-diffusion-license).
79
- It allows for responsible research and commercial use, but **strictly prohibits**:
80
-
81
- - Harassment, surveillance, or profiling.
82
- - Generating misleading or harmful content (e.g., deepfakes for impersonation).
83
- - Use in political campaigns or autonomous weapons.
84
-
85
- Please read the license carefully before using the model.
86
-