File size: 2,198 Bytes
76f4cd5
3642cce
 
 
76f4cd5
 
3d20a80
76f4cd5
 
3642cce
 
76f4cd5
 
3642cce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d20a80
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
title: ESG Document Intelligence Platform
emoji: 🌿
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: HyperRAG + Discourse Graph for ESG Report Analysis
---

# 🌿 Multimodal ESG Document Intelligence Platform

> **HyperRAG + Discourse Graph Reasoning for ESG Report Analysis**

Upload any ESG / Sustainability PDF report and get:

- 💬 **Contextual Q&A** — ask questions about the report, answered with page-level evidence
- 📊 **ESG Pillar Scores** — keyword-based E, S, G scoring + sector detection
- 🚨 **Greenwashing Detection** — flags unsubstantiated claims with exact page references
- 🕸️ **Discourse Graph Insights** — models relationships between claims, evidence, policies and metrics

## Architecture

```
PDF → Text Extraction (pdfplumber)
    → Chunking (400-word windows, 80-word overlap)
    → Embeddings (sentence-transformers/all-MiniLM-L6-v2)
    → Qdrant Vector Index (in-memory)
    → Discourse Graph (NetworkX DiGraph)
         claims ──supported_by──▶ evidence
         policies ──measured_by──▶ metrics
    → HyperRAG Retrieval
         vector search + graph neighbourhood expansion
    → Flan-T5 Answer Generation
```

## Key Technologies

| Layer | Technology |
|-------|-----------|
| Vector Store | Qdrant (in-memory) |
| Embeddings | `all-MiniLM-L6-v2` |
| LLM | `google/flan-t5-base` |
| Graph | NetworkX DiGraph |
| Retrieval | HyperRAG (vector + graph) |
| UI | Gradio |

## Usage

1. **Upload** an ESG report PDF in the *Upload & Process* tab
2. Click **Process Document** — wait ~30–60 s for indexing
3. Switch to any analysis tab and explore!

## Limitations

- ESG scores are keyword-density heuristics (not certified ratings)
- `flan-t5-base` is used for CPU compatibility; swap in a larger model for production
- Greenwashing detection is pattern-based and requires expert review

## Running Locally

```bash
git clone https://huggingface.co/spaces/<your-username>/esg-intelligence
cd esg-intelligence
pip install -r requirements.txt
python app.py
```

## License

Apache 2.0 — research & demonstration use only.