SceneGraphNet / README.md
Kalp Kanungo
updated readme
f558554
---
title: Scene Graph Generator
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
---
# 🧠 Scene Graph Generator (Multimodal AI System)
A multimodal computer vision system that takes an input image, detects objects, predicts relationships between them, constructs a structured scene graph, and generates a natural language description of the scene.
πŸ”— **Live Demo (Hugging Face Spaces):**
https://huggingface.co/spaces/<your-username>/scene-graph-generator
---
# πŸš€ Features
- πŸ–ΌοΈ Object Detection using DETR (ResNet-50)
- πŸ”— Relationship Prediction (Custom Trained Model)
- πŸ“ Spatial Reasoning (Hybrid AI with Geometry Rules)
- 🧩 Scene Graph Construction (Directed Graph)
- πŸ“Š Graph Visualization (NetworkX + Matplotlib)
- 🧠 Graph-to-Text Generation (FLAN-T5)
- 🌐 Interactive UI (Gradio)
- ☁️ Deployed on Hugging Face Spaces (CPU)
---
# 🧠 How It Works (End-to-End Pipeline)
### 1. Input
- User uploads an image (JPG/PNG) via Gradio UI
- Image is converted from PIL β†’ OpenCV format
---
### 2. Object Detection
- Uses `facebook/detr-resnet-50` from Hugging Face
- Outputs:
- Object labels (COCO classes)
- Bounding boxes
- Confidence scores
- Applies threshold (β‰₯ 0.7) to filter noise
---
### 3. Pairwise Object Processing
- Generates object pairs using `itertools.combinations`
- Extracts bounding boxes for each pair
- Creates union region for relation inference
- Filters duplicate object pairs
---
### 4. Relationship Prediction
- Custom-trained classifier on Visual Genome subset (~10K samples)
- Predicts semantic relations:
- `on`, `holding`, `behind`, etc.
- Trained using PyTorch (10 epochs)
---
### 5. Spatial Reasoning (Hybrid AI)
- Uses bounding box geometry to compute:
- `left_of`, `right_of`, `above`, `below`, `near`
- Hybrid logic:
- Semantic relations from model (if confident)
- Otherwise fallback to spatial rules
- Reduces bias (e.g., β€œeverything = on”)
---
### 6. Graph Construction
- Builds a **directed graph (NetworkX DiGraph)**
- Nodes β†’ objects
- Edges β†’ relationships
- Removes duplicates and limits edges for clarity
---
### 7. Graph Visualization
- Uses NetworkX + Matplotlib
- Displays:
- Directed edges with labels
- Clean layout for readability
---
### 8. Graph β†’ Text (NLP)
- Uses `google/flan-t5-small`
- Converts structured triples into natural language
Example:
laptop β†’ on β†’ table
mouse β†’ next_to β†’ laptop
Output:
"A laptop is placed on a table with a mouse next to it."
---
### 9. UI (Gradio)
- Upload image
- View:
- Scene graph
- Generated description
- Fully interactive and browser-based
---
# πŸ—οΈ Tech Stack
```
- **Computer Vision:** DETR (Hugging Face Transformers)
- **Deep Learning:** PyTorch
- **Graph Processing:** NetworkX
- **NLP:** FLAN-T5
- **Image Processing:** OpenCV
- **Frontend/UI:** Gradio
- **Deployment:** Hugging Face Spaces
```
---
# πŸ“ Project Structure
scene-graph-generator/
β”‚
β”œβ”€β”€ app.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”‚
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ pipeline.py
β”‚ β”œβ”€β”€ detection.py
β”‚ β”œβ”€β”€ spatial_rules.py
β”‚ β”œβ”€β”€ relationship_infer.py
β”‚ β”œβ”€β”€ scene_graph.py
β”‚ β”œβ”€β”€ visualization.py
β”‚ β”œβ”€β”€ text_generation.py
---
# βš™οΈ Installation (Local Setup)
```bash
git clone https://github.com/<your-username>/scene-graph-generator.git
cd scene-graph-generator
pip install -r requirements.txt
python app.py