LinaAlkh commited on
Commit
e0127c3
·
verified ·
1 Parent(s): 591a8e4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Real Estate Manipulation Detector
3
+ emoji: 🏠
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: static
7
+ pinned: false
8
+ tags:
9
+ - computer-vision
10
+ - forensics
11
+ - real-estate
12
+ - blip
13
+ - resnet
14
+ license: mit
15
+ ---
16
+
17
+ # 🕵️‍♂️ Real Estate Manipulation Detector (Hybrid VLM-CNN)
18
+
19
+ **Team Name:** Lina Alkhatib
20
+ **Track:** Track B (Real Estate)
21
+ **Date:** January 28, 2026
22
+
23
+ ## 1. Executive Summary
24
+ This project implements an automated forensic system designed to detect and explain digital manipulations in real estate imagery. Addressing the challenge of "fake listings," our solution employs a **Hybrid Vision-Language Architecture**. By combining the high-speed pattern recognition of a Convolutional Neural Network (**ResNet-18**) with the semantic reasoning capabilities of a Vision-Language Model (**BLIP**), the system achieves both high detection accuracy and human-readable interpretability.
25
+
26
+ ## 2. System Architecture
27
+ The system operates on a **Serial Cascading Pipeline**, utilizing two distinct modules:
28
+
29
+ ### Module 1: The Detector (Quantitative Analysis)
30
+ * **Architecture:** ResNet-18 (Residual Neural Network).
31
+ * **Role:** Rapid binary classification and manipulation type scoring.
32
+ * **Classes:** `Real`, `Fake_AI`, `Fake_Splice`.
33
+ * **Output:** An `Authenticity Score` (0.0 - 1.0) and a predicted class label.
34
+
35
+ ### Module 2: The Reasoner (Qualitative Forensics)
36
+ * **Architecture:** BLIP (Visual Question Answering).
37
+ * **Role:** Semantic analysis and report generation.
38
+ * **Mechanism:** The model answers specific physics-based questions about shadows, lighting, and object floating to generate a forensic report.
39
+
40
+ ## 3. The "Fusion Strategy"
41
+ We use a **Conditional Logic Fusion Strategy**:
42
+ 1. **Step 1:** The image is passed through ResNet-18.
43
+ 2. **Step 2:** If flagged as `Fake`, the image is passed to BLIP.
44
+ 3. **Step 3:** BLIP is "interrogated" with targeted prompts (*"Does the object cast a shadow?"*, *"Is lighting consistent?"*).
45
+ 4. **Step 4:** A logic layer synthesizes the answers into a final text report (e.g., *"Manipulation detected: the chair lacks a grounded contact shadow"*).
46
+
47
+ ## 4. How to Run
48
+ 1. Clone this repository.
49
+ 2. Install dependencies: `pip install -r requirements.txt`
50
+ 3. Run the inference script:
51
+ ```bash
52
+ python predict.py --input_dir ./test_images --output_file submission.json --model_path detector_model.pth
53
+ ```
54
+
55
+ ## 5. Files in this Repo
56
+ * `predict.py`: The main inference script.
57
+ * `detector_model.pth`: The trained ResNet-18 weights.
58
+ * `requirements.txt`: Python dependencies.