Spaces:
Sleeping
Sleeping
metadata
title: ESG Document Intelligence Platform
emoji: 🌿
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: HyperRAG + Discourse Graph for ESG Report Analysis
🌿 Multimodal ESG Document Intelligence Platform
HyperRAG + Discourse Graph Reasoning for ESG Report Analysis
Upload any ESG / Sustainability PDF report and get:
- 💬 Contextual Q&A — ask questions about the report, answered with page-level evidence
- 📊 ESG Pillar Scores — keyword-based E, S, G scoring + sector detection
- 🚨 Greenwashing Detection — flags unsubstantiated claims with exact page references
- 🕸️ Discourse Graph Insights — models relationships between claims, evidence, policies and metrics
Architecture
PDF → Text Extraction (pdfplumber)
→ Chunking (400-word windows, 80-word overlap)
→ Embeddings (sentence-transformers/all-MiniLM-L6-v2)
→ Qdrant Vector Index (in-memory)
→ Discourse Graph (NetworkX DiGraph)
claims ──supported_by──▶ evidence
policies ──measured_by──▶ metrics
→ HyperRAG Retrieval
vector search + graph neighbourhood expansion
→ Flan-T5 Answer Generation
Key Technologies
| Layer | Technology |
|---|---|
| Vector Store | Qdrant (in-memory) |
| Embeddings | all-MiniLM-L6-v2 |
| LLM | google/flan-t5-base |
| Graph | NetworkX DiGraph |
| Retrieval | HyperRAG (vector + graph) |
| UI | Gradio |
Usage
- Upload an ESG report PDF in the Upload & Process tab
- Click Process Document — wait ~30–60 s for indexing
- Switch to any analysis tab and explore!
Limitations
- ESG scores are keyword-density heuristics (not certified ratings)
flan-t5-baseis used for CPU compatibility; swap in a larger model for production- Greenwashing detection is pattern-based and requires expert review
Running Locally
git clone https://huggingface.co/spaces/<your-username>/esg-intelligence
cd esg-intelligence
pip install -r requirements.txt
python app.py
License
Apache 2.0 — research & demonstration use only.