LinaAlkh
/

Real-Estate-Forensics

computer-vision

Model card Files Files and versions

LinaAlkh commited on Jan 28

Commit

e0127c3

·

verified ·

1 Parent(s): 591a8e4

Create README.md

Files changed (1) hide show

README.md +58 -0

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+---
+title: Real Estate Manipulation Detector
+emoji: 🏠
+colorFrom: blue
+colorTo: indigo
+sdk: static
+pinned: false
+tags:
+- computer-vision
+- forensics
+- real-estate
+- blip
+- resnet
+license: mit
+---
+# 🕵️‍♂️ Real Estate Manipulation Detector (Hybrid VLM-CNN)
+**Team Name:** Lina Alkhatib
+**Track:** Track B (Real Estate)
+**Date:** January 28, 2026
+## 1. Executive Summary
+This project implements an automated forensic system designed to detect and explain digital manipulations in real estate imagery. Addressing the challenge of "fake listings," our solution employs a **Hybrid Vision-Language Architecture**. By combining the high-speed pattern recognition of a Convolutional Neural Network (**ResNet-18**) with the semantic reasoning capabilities of a Vision-Language Model (**BLIP**), the system achieves both high detection accuracy and human-readable interpretability.
+## 2. System Architecture
+The system operates on a **Serial Cascading Pipeline**, utilizing two distinct modules:
+### Module 1: The Detector (Quantitative Analysis)
+* **Architecture:** ResNet-18 (Residual Neural Network).
+* **Role:** Rapid binary classification and manipulation type scoring.
+* **Classes:** `Real`, `Fake_AI`, `Fake_Splice`.
+* **Output:** An `Authenticity Score` (0.0 - 1.0) and a predicted class label.
+### Module 2: The Reasoner (Qualitative Forensics)
+* **Architecture:** BLIP (Visual Question Answering).
+* **Role:** Semantic analysis and report generation.
+* **Mechanism:** The model answers specific physics-based questions about shadows, lighting, and object floating to generate a forensic report.
+## 3. The "Fusion Strategy"
+We use a **Conditional Logic Fusion Strategy**:
+1.  **Step 1:** The image is passed through ResNet-18.
+2.  **Step 2:** If flagged as `Fake`, the image is passed to BLIP.
+3.  **Step 3:** BLIP is "interrogated" with targeted prompts (*"Does the object cast a shadow?"*, *"Is lighting consistent?"*).
+4.  **Step 4:** A logic layer synthesizes the answers into a final text report (e.g., *"Manipulation detected: the chair lacks a grounded contact shadow"*).
+## 4. How to Run
+1.  Clone this repository.
+2.  Install dependencies: `pip install -r requirements.txt`
+3.  Run the inference script:
+    ```bash
+    python predict.py --input_dir ./test_images --output_file submission.json --model_path detector_model.pth
+    ```
+## 5. Files in this Repo
+* `predict.py`: The main inference script.
+* `detector_model.pth`: The trained ResNet-18 weights.
+* `requirements.txt`: Python dependencies.