File size: 3,494 Bytes
c858478
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f558554
c858478
 
 
 
 
 
 
 
 
f558554
 
c858478
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: Scene Graph Generator
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
---

# 🧠 Scene Graph Generator (Multimodal AI System)

A multimodal computer vision system that takes an input image, detects objects, predicts relationships between them, constructs a structured scene graph, and generates a natural language description of the scene.

πŸ”— **Live Demo (Hugging Face Spaces):**  
https://huggingface.co/spaces/<your-username>/scene-graph-generator

---

# πŸš€ Features

- πŸ–ΌοΈ Object Detection using DETR (ResNet-50)
- πŸ”— Relationship Prediction (Custom Trained Model)
- πŸ“ Spatial Reasoning (Hybrid AI with Geometry Rules)
- 🧩 Scene Graph Construction (Directed Graph)
- πŸ“Š Graph Visualization (NetworkX + Matplotlib)
- 🧠 Graph-to-Text Generation (FLAN-T5)
- 🌐 Interactive UI (Gradio)
- ☁️ Deployed on Hugging Face Spaces (CPU)

---

# 🧠 How It Works (End-to-End Pipeline)

### 1. Input
- User uploads an image (JPG/PNG) via Gradio UI
- Image is converted from PIL β†’ OpenCV format

---

### 2. Object Detection
- Uses `facebook/detr-resnet-50` from Hugging Face
- Outputs:
  - Object labels (COCO classes)
  - Bounding boxes
  - Confidence scores
- Applies threshold (β‰₯ 0.7) to filter noise

---

### 3. Pairwise Object Processing
- Generates object pairs using `itertools.combinations`
- Extracts bounding boxes for each pair
- Creates union region for relation inference
- Filters duplicate object pairs

---

### 4. Relationship Prediction
- Custom-trained classifier on Visual Genome subset (~10K samples)
- Predicts semantic relations:
  - `on`, `holding`, `behind`, etc.
- Trained using PyTorch (10 epochs)

---

### 5. Spatial Reasoning (Hybrid AI)
- Uses bounding box geometry to compute:
  - `left_of`, `right_of`, `above`, `below`, `near`
- Hybrid logic:
  - Semantic relations from model (if confident)
  - Otherwise fallback to spatial rules
- Reduces bias (e.g., β€œeverything = on”)

---

### 6. Graph Construction
- Builds a **directed graph (NetworkX DiGraph)**
  - Nodes β†’ objects
  - Edges β†’ relationships
- Removes duplicates and limits edges for clarity

---

### 7. Graph Visualization
- Uses NetworkX + Matplotlib
- Displays:
  - Directed edges with labels
  - Clean layout for readability

---

### 8. Graph β†’ Text (NLP)
- Uses `google/flan-t5-small`
- Converts structured triples into natural language

Example:
laptop β†’ on β†’ table
mouse β†’ next_to β†’ laptop

Output:
"A laptop is placed on a table with a mouse next to it."

---

### 9. UI (Gradio)
- Upload image
- View:
  - Scene graph
  - Generated description
- Fully interactive and browser-based

---

# πŸ—οΈ Tech Stack
```

- **Computer Vision:** DETR (Hugging Face Transformers)
- **Deep Learning:** PyTorch
- **Graph Processing:** NetworkX
- **NLP:** FLAN-T5
- **Image Processing:** OpenCV
- **Frontend/UI:** Gradio
- **Deployment:** Hugging Face Spaces

```

---

# πŸ“ Project Structure
scene-graph-generator/
β”‚
β”œβ”€β”€ app.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”‚
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ pipeline.py
β”‚ β”œβ”€β”€ detection.py
β”‚ β”œβ”€β”€ spatial_rules.py
β”‚ β”œβ”€β”€ relationship_infer.py
β”‚ β”œβ”€β”€ scene_graph.py
β”‚ β”œβ”€β”€ visualization.py
β”‚ β”œβ”€β”€ text_generation.py

---

# βš™οΈ Installation (Local Setup)

```bash
git clone https://github.com/<your-username>/scene-graph-generator.git
cd scene-graph-generator

pip install -r requirements.txt
python app.py