Spaces:

asis-bc
/

graphrag-doctor

Runtime error

App Files Files Community

graphrag-doctor / README.md

raayraay

Update README.md

86d31c8 verified 3 months ago

preview code

raw

history blame contribute delete

9.61 kB

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

metadata

title: GraphRAG Doctor
emoji: 🕸️
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 6.5.0
app_file: app.py
pinned: false

🕸️ GraphRAG Doctor

Enterprise-grade hallucination detection and forensic analysis for GraphRAG (Graph Retrieval-Augmented Generation) systems.

GraphRAG Doctor analyzes LLM-generated responses against retrieved knowledge graph contexts (community summaries, entity triples, and structural metadata) to detect hallucinations, quantify groundedness, and provide actionable diagnostics for pipeline optimization.

🚀 Features

🔍 Multi-Granularity Analysis: Evaluates evidence at sentence-level with semantic similarity scoring
📊 Visual Heatmaps: Interactive color-coded response highlighting (Green/Yellow/Red evidence zones)
🧠 Dual Model Support: Toggle between Speed (MiniLM-L6, 30MB) and Accuracy (MPNet-Base, 400MB) embeddings
⚡ Production Optimized: Async processing, model caching, and memory-efficient batch encoding
📈 Comprehensive Metrics: Graph relevance, groundedness scores, entity drift detection
🔒 Enterprise Security: Input sanitization, HTML escaping, and structured logging
☁️ HF Spaces Optimized: Defaults to CPU (auto-detects GPU), memory-safe concurrency limits, Secrets integration

⚡ Quick Start

Option A: Deploy to Hugging Face Spaces (Recommended)

Click "Duplicate this Space" or create new Space with SDK gradio
Upload app.py and requirements.txt
Set hardware to CPU (default) or GPU (Settings → Hardware)
Your app is live at https://[username]-[space-name].hf.space

Option B: Local Development

# Clone repository
git clone [https://github.com/your-org/graphrag-doctor.git](https://github.com/your-org/graphrag-doctor.git)
cd graphrag-doctor

# Install dependencies (see requirements.txt)
pip install gradio sentence-transformers numpy pydantic-settings

# Run locally
python app.py

Access at http://localhost:7860

🤗 Hugging Face Spaces

Secrets Configuration

In your Space Settings → Secrets (not .env files):

Secret	Value	Description
`DEBUG`	`false`	Enable verbose JSON logging
`DEVICE`	`cpu`	Force CPU (auto-detected if GPU available)
`MAX_CONCURRENT_REQUESTS`	`3`	Limit parallel requests (memory safety)

Hardware Requirements

Free Tier (CPU): Works perfectly with "speed" model. First model load takes ~30s (cached thereafter).
GPU Tier: Auto-detected, enables "accuracy" model for better quality.

Files Structure

├── app.py              # Main application
├── requirements.txt    # Dependencies
└── README.md           # This file

💻 Usage

Basic Workflow

Input Question: The original user query sent to your GraphRAG system
Input Graph Context: Paste the retrieved context (community summaries, entity descriptions, or knowledge triples)
Input Generated Answer: The LLM response to analyze
Configure Thresholds:

Grounded Threshold (default 0.6): Minimum cosine similarity for "GROUNDED" classification
Weak Threshold (default 0.4): Minimum similarity for "UNCERTAIN" classification
Below 0.4 = "HALLUCINATION"

Input Formats

Community Summaries (Microsoft GraphRAG format):

### Community 1
The entity SpaceX is related to Elon Musk through founded_by relationship...

### Community 2
Tesla Inc. operates in the automotive sector with focus on electric vehicles...

Entity/Triple Format:

Entity: SpaceX (Organization)
Entity: Elon Musk (Person)
[SpaceX] -FOUNDED_BY-> [Elon Musk]
[SpaceX] -HEADQUARTERS-> [Hawthorne, CA]

Raw Text: Paragraphs separated by double newlines

Interpreting Results

Metric	Description	Healthy Range
Graph Relevance	Max similarity between question and graph context	> 0.6
Groundedness	% of sentences with evidence support	> 80%
System Health	Overall pipeline status	Healthy

Evidence Status Colors:

🟢 Green: Grounded (similarity ≥ 0.6) - Evidence found in graph context
🟡 Yellow: Uncertain (0.4 ≤ similarity < 0.6) - Weak evidence, possible paraphrasing
🔴 Red: Hallucination (< 0.4) - No supporting evidence detected

🏗️ Architecture

graph TD
    A[User Input] --> B{Input Validation}
    B -->|Sanitize| C[Text Normalizer]
    C --> D[Graph Parser]
    D -->|Extract| E[Graph Elements]
    E --> F[Embedding Engine]
    F -->|Async Load| G[Model Manager]
    G -->|Cache| H[SentenceTransformers]
    H --> I[Similarity Matrix]
    I --> J[Evidence Analyzer]
    J -->|Score| K[Sentence Analysis]
    K --> L[Diagnosis Engine]
    L -->|Generate| M[Diagnostics]
    M --> N[HTML Renderer]
    N --> O[Gradio UI]
    
    style A fill:#e1f5ff
    style O fill:#d4edda
    style G fill:#fff3cd

Component Overview

Component	Responsibility
`ModelManager`	Singleton model cache with LRU eviction (max 1 model for HF Spaces memory)
`TextNormalizer`	Input validation, sentence segmentation, graph parsing
`EvidenceAnalyzer`	Async embedding generation, similarity computation, scoring logic
`HTMLRenderer`	Secure HTML generation, XSS prevention, responsive layouts
`GraphRAGDoctorApp`	Request orchestration, semaphore concurrency, error handling

Tech Stack

Backend: Python 3.9+, Asyncio, Pydantic Settings v2
ML: SentenceTransformers, PyTorch (CPU/GPU auto-detect)
Frontend: Gradio 4.0+
Validation: Pydantic with environment variable support

⚙️ Configuration

Environment Variables

Set via Hugging Face Space Secrets (or .env locally):

Variable	Default	Description
`DEBUG`	`false`	Enable verbose JSON logging
`DEVICE`	`cpu`/`cuda`	Compute device (auto-detected)
`MAX_CONCURRENT_REQUESTS`	`3`	Parallel analysis limit (set to 3 for HF Spaces CPU)
`MAX_INPUT_LENGTH`	`50000`	Character limit per input field
`BATCH_SIZE`	`32`	Encoding batch size
`DEFAULT_MODEL`	`speed`	Default embedding model

Threshold Tuning Guide

High Precision Mode (minimize false positives):

GREEN_THRESHOLD = 0.75
YELLOW_THRESHOLD = 0.50

High Recall Mode (catch all potential hallucinations):

GREEN_THRESHOLD = 0.50
YELLOW_THRESHOLD = 0.30

HF Spaces Note: On CPU, stick with "speed" model (MiniLM-L6) for <2s response times. Use "accuracy" (MPNet) only on GPU tier.

🐛 Troubleshooting

Model Loading on HF Spaces

Symptom: Timeout on first request

Solution:

First model download takes 30-60s on Spaces (cached for subsequent requests)
If timeout persists, upgrade to CPU Upgrade tier or use speed model only

Low GPU Memory on Spaces

Symptom: CUDA out of memory

Solution:

Switch to CPU: Set DEVICE=cpu in Secrets
Or reduce BATCH_SIZE to 16 in code
Ensure only one model loaded at a time (default behavior)

High Latency on Free Tier

Optimization:

Use speed model (default): ~100-300ms per analysis
Avoid long inputs (>1000 characters) which increase embedding time
Enable Persistent Storage in Space Settings to cache downloaded models

Import Error: pydantic_settings

Symptom: ModuleNotFoundError: No module named 'pydantic_settings'

Solution: Ensure requirements.txt includes:

pydantic>=2.0.0
pydantic-settings>=2.0.0

Low Graph Relevance Scores

Symptom: Graph Relevance consistently < 0.4

Diagnostics:

Verify GraphRAG retrieval is working (check community detection levels)
Ensure context includes entity summaries, not just raw text
Tune similarity thresholds based on your domain's semantic similarity distributions

🤝 Contributing

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

📄 License

Distributed under the MIT License. See LICENSE for more information.

🙏 Acknowledgments

Sentence-Transformers for embedding models
Gradio for the interactive UI framework
Microsoft GraphRAG for the inspiration
Hugging Face Spaces for free hosting

Built with ❤️ for the GraphRAG community