LinaAlkh
/

Real-Estate-Forensics

computer-vision

Model card Files Files and versions

Real-Estate-Forensics / README.md

LinaAlkh's picture

Create README.md

e0127c3 verified 20 days ago

|

history blame contribute delete

2.53 kB

	---
	title: Real Estate Manipulation Detector
	emoji: 🏠
	colorFrom: blue
	colorTo: indigo
	sdk: static
	pinned: false
	tags:
	- computer-vision
	- forensics
	- real-estate
	- blip
	- resnet
	license: mit
	---

	# 🕵️‍♂️ Real Estate Manipulation Detector (Hybrid VLM-CNN)

	Team Name: Lina Alkhatib
	Track: Track B (Real Estate)
	Date: January 28, 2026

	## 1. Executive Summary
	This project implements an automated forensic system designed to detect and explain digital manipulations in real estate imagery. Addressing the challenge of "fake listings," our solution employs a Hybrid Vision-Language Architecture. By combining the high-speed pattern recognition of a Convolutional Neural Network (ResNet-18) with the semantic reasoning capabilities of a Vision-Language Model (BLIP), the system achieves both high detection accuracy and human-readable interpretability.

	## 2. System Architecture
	The system operates on a Serial Cascading Pipeline, utilizing two distinct modules:

	### Module 1: The Detector (Quantitative Analysis)
	* Architecture: ResNet-18 (Residual Neural Network).
	* Role: Rapid binary classification and manipulation type scoring.
	* Classes: `Real`, `Fake_AI`, `Fake_Splice`.
	* Output: An `Authenticity Score` (0.0 - 1.0) and a predicted class label.

	### Module 2: The Reasoner (Qualitative Forensics)
	* Architecture: BLIP (Visual Question Answering).
	* Role: Semantic analysis and report generation.
	* Mechanism: The model answers specific physics-based questions about shadows, lighting, and object floating to generate a forensic report.

	## 3. The "Fusion Strategy"
	We use a Conditional Logic Fusion Strategy:
	1. Step 1: The image is passed through ResNet-18.
	2. Step 2: If flagged as `Fake`, the image is passed to BLIP.
	3. Step 3: BLIP is "interrogated" with targeted prompts ("Does the object cast a shadow?", "Is lighting consistent?").
	4. Step 4: A logic layer synthesizes the answers into a final text report (e.g., "Manipulation detected: the chair lacks a grounded contact shadow").

	## 4. How to Run
	1. Clone this repository.
	2. Install dependencies: `pip install -r requirements.txt`
	3. Run the inference script:
	```bash
	python predict.py --input_dir ./test_images --output_file submission.json --model_path detector_model.pth
	```

	## 5. Files in this Repo
	* `predict.py`: The main inference script.
	* `detector_model.pth`: The trained ResNet-18 weights.
	* `requirements.txt`: Python dependencies.