SceneGraphNet / README.md
Kalp Kanungo
updated readme
f558554

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: Scene Graph Generator
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false

🧠 Scene Graph Generator (Multimodal AI System)

A multimodal computer vision system that takes an input image, detects objects, predicts relationships between them, constructs a structured scene graph, and generates a natural language description of the scene.

πŸ”— Live Demo (Hugging Face Spaces):
https://huggingface.co/spaces//scene-graph-generator


πŸš€ Features

  • πŸ–ΌοΈ Object Detection using DETR (ResNet-50)
  • πŸ”— Relationship Prediction (Custom Trained Model)
  • πŸ“ Spatial Reasoning (Hybrid AI with Geometry Rules)
  • 🧩 Scene Graph Construction (Directed Graph)
  • πŸ“Š Graph Visualization (NetworkX + Matplotlib)
  • 🧠 Graph-to-Text Generation (FLAN-T5)
  • 🌐 Interactive UI (Gradio)
  • ☁️ Deployed on Hugging Face Spaces (CPU)

🧠 How It Works (End-to-End Pipeline)

1. Input

  • User uploads an image (JPG/PNG) via Gradio UI
  • Image is converted from PIL β†’ OpenCV format

2. Object Detection

  • Uses facebook/detr-resnet-50 from Hugging Face
  • Outputs:
    • Object labels (COCO classes)
    • Bounding boxes
    • Confidence scores
  • Applies threshold (β‰₯ 0.7) to filter noise

3. Pairwise Object Processing

  • Generates object pairs using itertools.combinations
  • Extracts bounding boxes for each pair
  • Creates union region for relation inference
  • Filters duplicate object pairs

4. Relationship Prediction

  • Custom-trained classifier on Visual Genome subset (~10K samples)
  • Predicts semantic relations:
    • on, holding, behind, etc.
  • Trained using PyTorch (10 epochs)

5. Spatial Reasoning (Hybrid AI)

  • Uses bounding box geometry to compute:
    • left_of, right_of, above, below, near
  • Hybrid logic:
    • Semantic relations from model (if confident)
    • Otherwise fallback to spatial rules
  • Reduces bias (e.g., β€œeverything = on”)

6. Graph Construction

  • Builds a directed graph (NetworkX DiGraph)
    • Nodes β†’ objects
    • Edges β†’ relationships
  • Removes duplicates and limits edges for clarity

7. Graph Visualization

  • Uses NetworkX + Matplotlib
  • Displays:
    • Directed edges with labels
    • Clean layout for readability

8. Graph β†’ Text (NLP)

  • Uses google/flan-t5-small
  • Converts structured triples into natural language

Example: laptop β†’ on β†’ table mouse β†’ next_to β†’ laptop

Output: "A laptop is placed on a table with a mouse next to it."


9. UI (Gradio)

  • Upload image
  • View:
    • Scene graph
    • Generated description
  • Fully interactive and browser-based

πŸ—οΈ Tech Stack


- **Computer Vision:** DETR (Hugging Face Transformers)
- **Deep Learning:** PyTorch
- **Graph Processing:** NetworkX
- **NLP:** FLAN-T5
- **Image Processing:** OpenCV
- **Frontend/UI:** Gradio
- **Deployment:** Hugging Face Spaces

πŸ“ Project Structure

scene-graph-generator/ β”‚ β”œβ”€β”€ app.py β”œβ”€β”€ requirements.txt β”œβ”€β”€ README.md β”‚ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ pipeline.py β”‚ β”œβ”€β”€ detection.py β”‚ β”œβ”€β”€ spatial_rules.py β”‚ β”œβ”€β”€ relationship_infer.py β”‚ β”œβ”€β”€ scene_graph.py β”‚ β”œβ”€β”€ visualization.py β”‚ β”œβ”€β”€ text_generation.py


βš™οΈ Installation (Local Setup)

git clone https://github.com/<your-username>/scene-graph-generator.git
cd scene-graph-generator

pip install -r requirements.txt
python app.py