Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Real Estate Manipulation Detector
|
| 3 |
+
emoji: 🏠
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: indigo
|
| 6 |
+
sdk: static
|
| 7 |
+
pinned: false
|
| 8 |
+
tags:
|
| 9 |
+
- computer-vision
|
| 10 |
+
- forensics
|
| 11 |
+
- real-estate
|
| 12 |
+
- blip
|
| 13 |
+
- resnet
|
| 14 |
+
license: mit
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# 🕵️♂️ Real Estate Manipulation Detector (Hybrid VLM-CNN)
|
| 18 |
+
|
| 19 |
+
**Team Name:** Lina Alkhatib
|
| 20 |
+
**Track:** Track B (Real Estate)
|
| 21 |
+
**Date:** January 28, 2026
|
| 22 |
+
|
| 23 |
+
## 1. Executive Summary
|
| 24 |
+
This project implements an automated forensic system designed to detect and explain digital manipulations in real estate imagery. Addressing the challenge of "fake listings," our solution employs a **Hybrid Vision-Language Architecture**. By combining the high-speed pattern recognition of a Convolutional Neural Network (**ResNet-18**) with the semantic reasoning capabilities of a Vision-Language Model (**BLIP**), the system achieves both high detection accuracy and human-readable interpretability.
|
| 25 |
+
|
| 26 |
+
## 2. System Architecture
|
| 27 |
+
The system operates on a **Serial Cascading Pipeline**, utilizing two distinct modules:
|
| 28 |
+
|
| 29 |
+
### Module 1: The Detector (Quantitative Analysis)
|
| 30 |
+
* **Architecture:** ResNet-18 (Residual Neural Network).
|
| 31 |
+
* **Role:** Rapid binary classification and manipulation type scoring.
|
| 32 |
+
* **Classes:** `Real`, `Fake_AI`, `Fake_Splice`.
|
| 33 |
+
* **Output:** An `Authenticity Score` (0.0 - 1.0) and a predicted class label.
|
| 34 |
+
|
| 35 |
+
### Module 2: The Reasoner (Qualitative Forensics)
|
| 36 |
+
* **Architecture:** BLIP (Visual Question Answering).
|
| 37 |
+
* **Role:** Semantic analysis and report generation.
|
| 38 |
+
* **Mechanism:** The model answers specific physics-based questions about shadows, lighting, and object floating to generate a forensic report.
|
| 39 |
+
|
| 40 |
+
## 3. The "Fusion Strategy"
|
| 41 |
+
We use a **Conditional Logic Fusion Strategy**:
|
| 42 |
+
1. **Step 1:** The image is passed through ResNet-18.
|
| 43 |
+
2. **Step 2:** If flagged as `Fake`, the image is passed to BLIP.
|
| 44 |
+
3. **Step 3:** BLIP is "interrogated" with targeted prompts (*"Does the object cast a shadow?"*, *"Is lighting consistent?"*).
|
| 45 |
+
4. **Step 4:** A logic layer synthesizes the answers into a final text report (e.g., *"Manipulation detected: the chair lacks a grounded contact shadow"*).
|
| 46 |
+
|
| 47 |
+
## 4. How to Run
|
| 48 |
+
1. Clone this repository.
|
| 49 |
+
2. Install dependencies: `pip install -r requirements.txt`
|
| 50 |
+
3. Run the inference script:
|
| 51 |
+
```bash
|
| 52 |
+
python predict.py --input_dir ./test_images --output_file submission.json --model_path detector_model.pth
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
## 5. Files in this Repo
|
| 56 |
+
* `predict.py`: The main inference script.
|
| 57 |
+
* `detector_model.pth`: The trained ResNet-18 weights.
|
| 58 |
+
* `requirements.txt`: Python dependencies.
|