|
|
--- |
|
|
title: Real Estate Manipulation Detector |
|
|
emoji: 🏠 |
|
|
colorFrom: blue |
|
|
colorTo: indigo |
|
|
sdk: static |
|
|
pinned: false |
|
|
tags: |
|
|
- computer-vision |
|
|
- forensics |
|
|
- real-estate |
|
|
- blip |
|
|
- resnet |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# 🕵️♂️ Real Estate Manipulation Detector (Hybrid VLM-CNN) |
|
|
|
|
|
**Team Name:** Lina Alkhatib |
|
|
**Track:** Track B (Real Estate) |
|
|
**Date:** January 28, 2026 |
|
|
|
|
|
## 1. Executive Summary |
|
|
This project implements an automated forensic system designed to detect and explain digital manipulations in real estate imagery. Addressing the challenge of "fake listings," our solution employs a **Hybrid Vision-Language Architecture**. By combining the high-speed pattern recognition of a Convolutional Neural Network (**ResNet-18**) with the semantic reasoning capabilities of a Vision-Language Model (**BLIP**), the system achieves both high detection accuracy and human-readable interpretability. |
|
|
|
|
|
## 2. System Architecture |
|
|
The system operates on a **Serial Cascading Pipeline**, utilizing two distinct modules: |
|
|
|
|
|
### Module 1: The Detector (Quantitative Analysis) |
|
|
* **Architecture:** ResNet-18 (Residual Neural Network). |
|
|
* **Role:** Rapid binary classification and manipulation type scoring. |
|
|
* **Classes:** `Real`, `Fake_AI`, `Fake_Splice`. |
|
|
* **Output:** An `Authenticity Score` (0.0 - 1.0) and a predicted class label. |
|
|
|
|
|
### Module 2: The Reasoner (Qualitative Forensics) |
|
|
* **Architecture:** BLIP (Visual Question Answering). |
|
|
* **Role:** Semantic analysis and report generation. |
|
|
* **Mechanism:** The model answers specific physics-based questions about shadows, lighting, and object floating to generate a forensic report. |
|
|
|
|
|
## 3. The "Fusion Strategy" |
|
|
We use a **Conditional Logic Fusion Strategy**: |
|
|
1. **Step 1:** The image is passed through ResNet-18. |
|
|
2. **Step 2:** If flagged as `Fake`, the image is passed to BLIP. |
|
|
3. **Step 3:** BLIP is "interrogated" with targeted prompts (*"Does the object cast a shadow?"*, *"Is lighting consistent?"*). |
|
|
4. **Step 4:** A logic layer synthesizes the answers into a final text report (e.g., *"Manipulation detected: the chair lacks a grounded contact shadow"*). |
|
|
|
|
|
## 4. How to Run |
|
|
1. Clone this repository. |
|
|
2. Install dependencies: `pip install -r requirements.txt` |
|
|
3. Run the inference script: |
|
|
```bash |
|
|
python predict.py --input_dir ./test_images --output_file submission.json --model_path detector_model.pth |
|
|
``` |
|
|
|
|
|
## 5. Files in this Repo |
|
|
* `predict.py`: The main inference script. |
|
|
* `detector_model.pth`: The trained ResNet-18 weights. |
|
|
* `requirements.txt`: Python dependencies. |