manar54 commited on
Commit
87f8126
·
verified ·
1 Parent(s): 5408850

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +34 -64
  2. predict.py +383 -0
  3. requirements.txt +12 -7
README.md CHANGED
@@ -1,97 +1,67 @@
1
-
2
  # Deepfake Detection Pipeline
3
 
4
- ## Overview
5
-
6
- This project is a deepfake detection pipeline combining a backbone model with a VLM (Vision-Language Model) for reasoning.
7
- It produces:
8
 
9
- * **Authenticity score** (0.0–1.0; 0 = real, 1 = fake)
10
- * **Manipulation type**: "Artificial", "Deepfake", or "Real"
11
- * **VLM explanation**: 2 sentences describing why the image is considered fake
12
 
13
- ---
 
 
 
14
 
15
  ## Installation
16
 
17
- 1. Clone the repository:
18
-
19
- ```bash
20
- git clone https://huggingface.co/manar54/DEFAKE
21
- ```
22
-
23
- 2. Install dependencies:
24
-
25
  ```bash
26
  pip install -r requirements.txt
27
  ```
28
 
29
- *requirements.txt* includes:
30
-
31
- ```
32
- transformers
33
- timm
34
- accelerate
35
- scikit-learn
36
- qwen-vl-utils
37
- opencv-python
38
- Pillow
39
- scikit-image
40
- numpy
41
- torch
42
- torchvision
43
- ```
44
-
45
- ---
46
-
47
  ## Usage
48
 
49
- Run the pipeline on a folder of images:
50
-
51
  ```bash
52
- python predict.py --input_dir /path/to/images --output_file predictions.json
53
  ```
54
 
55
- **Arguments:**
56
 
57
- * `--input_dir`: Path to the folder containing images
58
- * `--output_file`: Path for saving predictions in JSON format
 
59
 
60
- The script will:
61
 
62
- 1. Classify each image using the backbone model
63
- 2. Run VLM reasoning for non-real images
64
- 3. Save predictions and explanations in `predictions.json`
65
 
66
- ---
67
 
68
- ## Example Output (JSON)
69
 
70
  ```json
71
  [
72
  {
73
- "image_name": "000001.jpg",
74
  "manipulation_type": "Deepfake",
75
- "authenticity_score": 0.87,
76
- "vlm_reasoning": "The subject's skin is unnaturally smooth on the cheeks and forehead. Shadows and high-frequency patterns indicate GAN artifacts."
77
  }
78
  ]
79
  ```
80
 
81
- ---
82
 
83
- ## CLI Reference
 
 
84
 
85
- ```python
86
- # -----------------------
87
- # CLI
88
- # -----------------------
89
- if __name__ == "__main__":
90
- import argparse
91
- parser = argparse.ArgumentParser()
92
- parser.add_argument("--input_dir", required=True)
93
- parser.add_argument("--output_file", required=True)
94
- args = parser.parse_args()
95
 
96
- main(args.input_dir, args.output_file)
97
- ```
 
 
 
 
 
 
 
 
 
1
  # Deepfake Detection Pipeline
2
 
3
+ A complete deepfake detection system that combines a backbone classifier with Vision-Language Model (VLM) reasoning for explainable predictions.
 
 
 
4
 
5
+ ## Features
 
 
6
 
7
+ - **Backbone Classification**: Uses SigLIP model to classify images as Artificial, Deepfake, or Real
8
+ - **Forensic Signal Extraction**: Analyzes texture, frequency, and compression artifacts
9
+ - **Conditional VLM Analysis**: Provides natural language explanations for non-real images using Qwen2-VL-2B
10
+ - **Efficient Processing**: Only runs VLM on images classified as non-real or low-confidence real
11
 
12
  ## Installation
13
 
 
 
 
 
 
 
 
 
14
  ```bash
15
  pip install -r requirements.txt
16
  ```
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## Usage
19
 
 
 
20
  ```bash
21
+ python predict.py --input_dir /path/to/test_images --output_file predictions.json
22
  ```
23
 
24
+ ### Arguments
25
 
26
+ - `--input_dir` (required): Path to folder containing images to analyze
27
+ - `--output_file` (required): Path to output JSON file for predictions
28
+ - `--real_threshold` (optional): Confidence threshold for "Real" classification (default: 0.90)
29
 
30
+ ### Example
31
 
32
+ ```bash
33
+ python predict.py --input_dir ./test_images --output_file results.json --real_threshold 0.85
34
+ ```
35
 
36
+ ## Output Format
37
 
38
+ The script generates a JSON file with predictions for each image:
39
 
40
  ```json
41
  [
42
  {
43
+ "image_name": "example.jpg",
44
  "manipulation_type": "Deepfake",
45
+ "authenticity_score": 0.8542,
46
+ "explanation": "The image exhibits unnatural texture smoothing in facial regions. Frequency analysis reveals artifacts consistent with GAN-based synthesis."
47
  }
48
  ]
49
  ```
50
 
51
+ ## Requirements
52
 
53
+ - Python 3.8+
54
+ - CUDA-capable GPU (recommended for faster processing)
55
+ - ~8GB GPU memory for VLM inference
56
 
57
+ ## Model Details
 
 
 
 
 
 
 
 
 
58
 
59
+ - **Backbone**: prithivMLmods/AI-vs-Deepfake-vs-Real-9999 (SigLIP)
60
+ - **VLM**: Qwen/Qwen2-VL-2B-Instruct
61
+ - **Forensic Analysis**: Laplacian, LBP, FFT, DCT
62
+
63
+ ## Notes
64
+
65
+ - The VLM only runs on images classified as non-real or with low confidence
66
+ - First run will download models (~2-4GB total)
67
+ - Supported image formats: .jpg, .jpeg, .png
predict.py ADDED
@@ -0,0 +1,383 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ COMPLETE DEEPFAKE DETECTION PIPELINE WITH CONDITIONAL VLM REASONING
4
+ """
5
+
6
+ # ============================================================================
7
+ # SECTION 1: SETUP AND DEPENDENCIES
8
+ # ============================================================================
9
+
10
+ import os
11
+ import torch
12
+ import numpy as np
13
+ import cv2
14
+ import json
15
+ import argparse
16
+ from PIL import Image
17
+ from typing import Dict, List
18
+ from transformers import AutoImageProcessor, SiglipForImageClassification
19
+ from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
20
+ from skimage.feature import local_binary_pattern
21
+ from scipy.fftpack import fft2, fftshift, dct
22
+ from qwen_vl_utils import process_vision_info
23
+
24
+ print("✓ All dependencies imported successfully!")
25
+
26
+ # ============================================================================
27
+ # SECTION 2: BACKBONE CLASSIFIER INITIALIZATION
28
+ # ============================================================================
29
+
30
+ MODEL_NAME = "prithivMLmods/AI-vs-Deepfake-vs-Real-9999"
31
+
32
+ print(f"Loading backbone model: {MODEL_NAME}")
33
+ processor = AutoImageProcessor.from_pretrained(MODEL_NAME)
34
+ model = SiglipForImageClassification.from_pretrained(MODEL_NAME)
35
+
36
+ device = "cuda" if torch.cuda.is_available() else "cpu"
37
+ model = model.to(device)
38
+ model.eval()
39
+
40
+ CLASS_NAMES = ["Artificial", "Deepfake", "Real"]
41
+
42
+ print(f"✓ Backbone model loaded successfully on {device}!")
43
+
44
+ # ============================================================================
45
+ # SECTION 3: FORENSIC SIGNAL EXTRACTION FUNCTIONS
46
+ # ============================================================================
47
+
48
+ def compute_texture_laplacian(gray):
49
+ """
50
+ Measures texture sharpness and natural variation.
51
+ Low variance → unnaturally smooth regions (common in synthesis).
52
+ """
53
+ lap = cv2.Laplacian(gray, cv2.CV_64F)
54
+ return float(lap.var())
55
+
56
+
57
+ def compute_lbp(gray):
58
+ """
59
+ Local Binary Patterns (LBP)
60
+ Captures micro-texture irregularities.
61
+ Low variance often indicates synthetic or filtered textures.
62
+ """
63
+ lbp = local_binary_pattern(gray, P=8, R=1, method="uniform")
64
+ return float(np.var(lbp))
65
+
66
+
67
+ def compute_fft(gray):
68
+ """
69
+ Frequency domain analysis using FFT.
70
+ Detects unnatural spectral energy caused by upsampling,
71
+ diffusion models, or GAN artifacts.
72
+ """
73
+ spectrum = fftshift(fft2(gray))
74
+ magnitude = np.log(np.abs(spectrum) + 1)
75
+ return float(np.mean(magnitude))
76
+
77
+
78
+ def compute_dct(gray):
79
+ """
80
+ Discrete Cosine Transform (DCT) analysis.
81
+ Captures JPEG compression inconsistencies introduced
82
+ by splicing, in-painting, or recompression.
83
+ """
84
+ gray = np.float32(gray) / 255.0
85
+ d = dct(dct(gray.T, norm="ortho").T, norm="ortho")
86
+ return float(np.std(d[:40, :40]))
87
+
88
+
89
+ def extract_forensic_signals(image_path):
90
+ """
91
+ Runs all forensic signal extractors on an image.
92
+ Returns a dictionary of low-level forensic measurements.
93
+ """
94
+ img = cv2.imread(image_path)
95
+ gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
96
+
97
+ return {
98
+ "texture_laplacian": compute_texture_laplacian(gray),
99
+ "lbp_texture": compute_lbp(gray),
100
+ "fft_frequency": compute_fft(gray),
101
+ "dct_compression": compute_dct(gray)
102
+ }
103
+
104
+ print("✓ Forensic signal functions defined!")
105
+
106
+ # ============================================================================
107
+ # SECTION 4: BACKBONE CLASSIFICATION FUNCTION
108
+ # ============================================================================
109
+
110
+ def classify_image(image_path):
111
+ """
112
+ Classify image using backbone model.
113
+ Returns prediction label and confidence.
114
+ """
115
+ # Load image
116
+ image = Image.open(image_path).convert("RGB")
117
+
118
+ # Preprocess
119
+ inputs = processor(images=image, return_tensors="pt").to(device)
120
+
121
+ # Forward pass
122
+ with torch.no_grad():
123
+ outputs = model(**inputs)
124
+ logits = outputs.logits
125
+ probs = torch.softmax(logits, dim=1).squeeze().cpu().numpy()
126
+
127
+ # Get highest probability and label
128
+ max_idx = int(np.argmax(probs))
129
+ manipulation_type = CLASS_NAMES[max_idx]
130
+
131
+ prob_real = float(probs[CLASS_NAMES.index("Real")])
132
+ authenticity_score = float(1.0 - prob_real)
133
+
134
+ return {
135
+ "manipulation_type": manipulation_type,
136
+ "authenticity_score": authenticity_score
137
+ }
138
+
139
+ print("✓ Backbone classification function defined!")
140
+
141
+ # ============================================================================
142
+ # SECTION 5: VLM ANALYZER CLASS
143
+ # ============================================================================
144
+
145
+ class VLMAnalyzer:
146
+ """
147
+ Qwen2-VL-2B analyzer.
148
+ Only runs if backbone predicts NON-REAL or low-confidence REAL.
149
+ Output: EXACTLY two sentences explaining why the image is not real.
150
+ """
151
+
152
+ def __init__(self, device: str = "cuda"):
153
+ self.device = device
154
+ self.model_name = "Qwen/Qwen2-VL-2B-Instruct"
155
+
156
+ print(f"Loading VLM: {self.model_name}")
157
+ self.model = Qwen2VLForConditionalGeneration.from_pretrained(
158
+ self.model_name,
159
+ torch_dtype=torch.float16,
160
+ device_map="auto"
161
+ )
162
+ self.processor = AutoProcessor.from_pretrained(self.model_name)
163
+ print("✓ VLM loaded successfully!")
164
+
165
+ def _create_prompt(self, backbone_result: Dict, signals: Dict) -> str:
166
+ """
167
+ Prompt focused ONLY on explaining why the image is NOT real.
168
+ """
169
+ return f"""You are an expert forensic image analyst.
170
+
171
+ This image has been classified as NOT REAL by an automated detection system.
172
+
173
+ Model prediction: {backbone_result['manipulation_type']}
174
+ Confidence: {backbone_result['authenticity_score']:.2%}
175
+
176
+ Forensic signals:
177
+ - Texture Laplacian: {signals['texture_laplacian']:.2f}
178
+ - LBP Texture Variance: {signals['lbp_texture']:.2f}
179
+ - FFT Frequency Energy: {signals['fft_frequency']:.2f}
180
+ - DCT Compression Std: {signals['dct_compression']:.4f}
181
+
182
+ TASK:
183
+ Explain WHY this image is not real.
184
+ Based on what can be visually observed in the image, explain why the image is not authentic.
185
+ Describe concrete visual or physical inconsistencies (e.g., texture behavior, edges, lighting, frequency artifacts)
186
+ Point out specific visual or physical inconsistencies that indicate synthetic or manipulated content.
187
+
188
+ RULES:
189
+ - Respond with EXACTLY two sentences
190
+ - Plain text only
191
+ - Do NOT mention probabilities, scores, or model confidence.
192
+ - No bullet points
193
+ - Do NOT say "this image may be real"
194
+ - Do NOT mention uncertainty
195
+ - Focus ONLY on manipulation evidence
196
+ - Be very specific to the content of THIS image.
197
+
198
+
199
+ Response:"""
200
+
201
+ def analyze(
202
+ self,
203
+ image_path: str,
204
+ backbone_result: Dict,
205
+ signals: Dict
206
+ ) -> str:
207
+ """
208
+ Run VLM only if image is non-real or low-confidence real.
209
+ """
210
+ # ⛔ Skip VLM if Real (this check is now done in pipeline, but keeping for safety)
211
+ if backbone_result["manipulation_type"] == "Real":
212
+ return "this image is real"
213
+
214
+ try:
215
+ prompt_text = self._create_prompt(backbone_result, signals)
216
+
217
+ messages = [
218
+ {
219
+ "role": "user",
220
+ "content": [
221
+ {"type": "image", "image": image_path},
222
+ {"type": "text", "text": prompt_text}
223
+ ]
224
+ }
225
+ ]
226
+
227
+ text = self.processor.apply_chat_template(
228
+ messages,
229
+ tokenize=False,
230
+ add_generation_prompt=True
231
+ )
232
+
233
+ image_inputs, video_inputs = process_vision_info(messages)
234
+
235
+ inputs = self.processor(
236
+ text=[text],
237
+ images=image_inputs,
238
+ videos=video_inputs,
239
+ padding=True,
240
+ return_tensors="pt"
241
+ ).to(self.device)
242
+
243
+ with torch.no_grad():
244
+ generated_ids = self.model.generate(
245
+ **inputs,
246
+ max_new_tokens=128,
247
+ temperature=0.1,
248
+ do_sample=False
249
+ )
250
+
251
+ generated_ids_trimmed = [
252
+ out_ids[len(in_ids):]
253
+ for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
254
+ ]
255
+
256
+ output_text = self.processor.batch_decode(
257
+ generated_ids_trimmed,
258
+ skip_special_tokens=True,
259
+ clean_up_tokenization_spaces=False
260
+ )[0].strip()
261
+
262
+ # Hard enforce EXACTLY two sentences
263
+ sentences = [s.strip() for s in output_text.split(".") if s.strip()]
264
+ output_text = ". ".join(sentences[:2]) + "."
265
+
266
+ return output_text
267
+
268
+ except Exception as e:
269
+ print(f"⚠ VLM error: {e}")
270
+ return (
271
+ "The image contains visual inconsistencies that are not consistent with natural image formation. "
272
+ "These artifacts align with patterns commonly seen in synthetic or manipulated imagery."
273
+ )
274
+
275
+ print("✓ VLM Analyzer class defined!")
276
+
277
+ # ============================================================================
278
+ # SECTION 6: FULL PIPELINE EXECUTION
279
+ # ============================================================================
280
+
281
+ def run_pipeline(
282
+ image_dir: str,
283
+ output_json: str = "predictions.json",
284
+ real_threshold: float = 0.90
285
+ ):
286
+ """
287
+ Runs full deepfake detection pipeline on all images in a directory.
288
+ """
289
+
290
+ vlm = VLMAnalyzer(device=device)
291
+ results = []
292
+
293
+ image_files = [
294
+ f for f in os.listdir(image_dir)
295
+ if f.lower().endswith((".jpg", ".jpeg", ".png"))
296
+ ]
297
+
298
+ print(f"\n📂 Found {len(image_files)} images to process\n")
299
+
300
+ for image_name in image_files:
301
+ image_path = os.path.join(image_dir, image_name)
302
+ print(f"🔍 Processing: {image_name}")
303
+
304
+ # 1️⃣ Backbone classification
305
+ backbone_result = classify_image(image_path)
306
+
307
+ prediction = {
308
+ "image_name": image_name,
309
+ "manipulation_type": backbone_result["manipulation_type"],
310
+ "authenticity_score": round(backbone_result["authenticity_score"], 4),
311
+ }
312
+
313
+ # 2️⃣ REAL → no VLM
314
+ if (
315
+ backbone_result["manipulation_type"] == "Real"
316
+ and backbone_result["authenticity_score"] >= real_threshold
317
+ ):
318
+ prediction["explanation"] = "The image is real."
319
+
320
+ # 3️⃣ NON-REAL → forensic + VLM
321
+ else:
322
+ signals = extract_forensic_signals(image_path)
323
+
324
+ explanation = vlm.analyze(
325
+ image_path=image_path,
326
+ backbone_result=backbone_result,
327
+ signals=signals
328
+ )
329
+
330
+ prediction["explanation"] = explanation
331
+
332
+ results.append(prediction)
333
+ print(f" ✓ {backbone_result['manipulation_type']} (score: {backbone_result['authenticity_score']:.4f})\n")
334
+
335
+ # 4️⃣ Save JSON
336
+ with open(output_json, "w") as f:
337
+ json.dump(results, f, indent=2)
338
+
339
+ print(f"✅ Pipeline finished. Results saved to {output_json}")
340
+
341
+
342
+ # ============================================================================
343
+ # MAIN
344
+ # ============================================================================
345
+
346
+ if __name__ == "__main__":
347
+ parser = argparse.ArgumentParser(
348
+ description="Deepfake Detection Pipeline with VLM Reasoning"
349
+ )
350
+ parser.add_argument(
351
+ "--input_dir",
352
+ required=True,
353
+ help="Path to folder with images"
354
+ )
355
+ parser.add_argument(
356
+ "--output_file",
357
+ required=True,
358
+ help="JSON file to save predictions"
359
+ )
360
+ parser.add_argument(
361
+ "--real_threshold",
362
+ type=float,
363
+ default=0.90,
364
+ help="Threshold for considering an image as 'Real' (default: 0.90)"
365
+ )
366
+
367
+ args = parser.parse_args()
368
+
369
+ # Validate input directory
370
+ if not os.path.exists(args.input_dir):
371
+ print(f"❌ Error: Input directory '{args.input_dir}' does not exist!")
372
+ exit(1)
373
+
374
+ if not os.path.isdir(args.input_dir):
375
+ print(f"❌ Error: '{args.input_dir}' is not a directory!")
376
+ exit(1)
377
+
378
+ # Run pipeline
379
+ run_pipeline(
380
+ image_dir=args.input_dir,
381
+ output_json=args.output_file,
382
+ real_threshold=args.real_threshold
383
+ )
requirements.txt CHANGED
@@ -1,7 +1,12 @@
1
- transformers
2
- timm
3
- accelerate
4
- scikit-learn
5
- qwen-vl-utils
6
- scikit-image
7
- opencv-python
 
 
 
 
 
 
1
+ transformers
2
+ timm
3
+ scikit-image
4
+ opencv-python
5
+ torch
6
+ numpy
7
+ pillow
8
+ scipy
9
+ scikit-learn
10
+ qwen-vl-utils
11
+ accelerate
12
+ kagglehub