Spaces:
Running
Running
File size: 3,494 Bytes
c858478 f558554 c858478 f558554 c858478 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | ---
title: Scene Graph Generator
emoji: π§
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
---
# π§ Scene Graph Generator (Multimodal AI System)
A multimodal computer vision system that takes an input image, detects objects, predicts relationships between them, constructs a structured scene graph, and generates a natural language description of the scene.
π **Live Demo (Hugging Face Spaces):**
https://huggingface.co/spaces/<your-username>/scene-graph-generator
---
# π Features
- πΌοΈ Object Detection using DETR (ResNet-50)
- π Relationship Prediction (Custom Trained Model)
- π Spatial Reasoning (Hybrid AI with Geometry Rules)
- π§© Scene Graph Construction (Directed Graph)
- π Graph Visualization (NetworkX + Matplotlib)
- π§ Graph-to-Text Generation (FLAN-T5)
- π Interactive UI (Gradio)
- βοΈ Deployed on Hugging Face Spaces (CPU)
---
# π§ How It Works (End-to-End Pipeline)
### 1. Input
- User uploads an image (JPG/PNG) via Gradio UI
- Image is converted from PIL β OpenCV format
---
### 2. Object Detection
- Uses `facebook/detr-resnet-50` from Hugging Face
- Outputs:
- Object labels (COCO classes)
- Bounding boxes
- Confidence scores
- Applies threshold (β₯ 0.7) to filter noise
---
### 3. Pairwise Object Processing
- Generates object pairs using `itertools.combinations`
- Extracts bounding boxes for each pair
- Creates union region for relation inference
- Filters duplicate object pairs
---
### 4. Relationship Prediction
- Custom-trained classifier on Visual Genome subset (~10K samples)
- Predicts semantic relations:
- `on`, `holding`, `behind`, etc.
- Trained using PyTorch (10 epochs)
---
### 5. Spatial Reasoning (Hybrid AI)
- Uses bounding box geometry to compute:
- `left_of`, `right_of`, `above`, `below`, `near`
- Hybrid logic:
- Semantic relations from model (if confident)
- Otherwise fallback to spatial rules
- Reduces bias (e.g., βeverything = onβ)
---
### 6. Graph Construction
- Builds a **directed graph (NetworkX DiGraph)**
- Nodes β objects
- Edges β relationships
- Removes duplicates and limits edges for clarity
---
### 7. Graph Visualization
- Uses NetworkX + Matplotlib
- Displays:
- Directed edges with labels
- Clean layout for readability
---
### 8. Graph β Text (NLP)
- Uses `google/flan-t5-small`
- Converts structured triples into natural language
Example:
laptop β on β table
mouse β next_to β laptop
Output:
"A laptop is placed on a table with a mouse next to it."
---
### 9. UI (Gradio)
- Upload image
- View:
- Scene graph
- Generated description
- Fully interactive and browser-based
---
# ποΈ Tech Stack
```
- **Computer Vision:** DETR (Hugging Face Transformers)
- **Deep Learning:** PyTorch
- **Graph Processing:** NetworkX
- **NLP:** FLAN-T5
- **Image Processing:** OpenCV
- **Frontend/UI:** Gradio
- **Deployment:** Hugging Face Spaces
```
---
# π Project Structure
scene-graph-generator/
β
βββ app.py
βββ requirements.txt
βββ README.md
β
βββ src/
β βββ pipeline.py
β βββ detection.py
β βββ spatial_rules.py
β βββ relationship_infer.py
β βββ scene_graph.py
β βββ visualization.py
β βββ text_generation.py
---
# βοΈ Installation (Local Setup)
```bash
git clone https://github.com/<your-username>/scene-graph-generator.git
cd scene-graph-generator
pip install -r requirements.txt
python app.py |